Google znalac

S Doveh, A Arbelle, S Harary… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision and Language (VL) models have demonstrated remarkable zero-shot performance in
a variety of tasks. However, some aspects of complex language understanding still remain a …

Spremi Citiraj Spominje se 80 puta Srodni članci Svih 10 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] aclanthology.org

Lightningdot: Pre-training visual-semantic embeddings for real-time image-text retrieval

S Sun, YC Chen, L Li, S Wang, Y Fang… - Proceedings of the 2021 …, 2021 - aclanthology.org

Multimodal pre-training has propelled great advancement in vision-and-language research.
These large-scale pre-trained models, although successful, fatefully suffer from slow …

Spremi Citiraj Spominje se 95 puta Srodni članci Svih 3 inačica Prikaži kao HTML

Vision-based real-time process monitoring and problem feedback for productivity-oriented analysis in off-site construction

X Chen, Y Wang, J Wang, A Bouferguene… - Automation in …, 2024 - Elsevier

The widespread adoption of surveillance cameras in work environments has enabled the
direct and non-intrusive detection of productivity-related issues in the field of construction. In …

Spremi Citiraj Spominje se 10 puta Srodni članci

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

M3p: Learning universal representations via multitask multilingual multimodal pre-training

M Ni, H Huang, L Su, E Cui, T Bharti… - Proceedings of the …, 2021 - openaccess.thecvf.com

We present M3P, a Multitask Multilingual Multimodal Pre-trained model that combines
multilingual pre-training and multimodal pre-training into a unified framework via multitask …

Spremi Citiraj Spominje se 120 puta Srodni članci Svih 8 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] mit.edu

Retrieve fast, rerank smart: Cooperative and joint approaches for improved cross-modal retrieval

G Geigle, J Pfeiffer, N Reimers, I Vulić… - Transactions of the …, 2022 - direct.mit.edu

Current state-of-the-art approaches to cross-modal retrieval process text and visual input
jointly, relying on Transformer-based architectures with cross-attention mechanisms that …

Spremi Citiraj Spominje se 68 puta Srodni članci Svih 8 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Multilingual multimodal pre-training for zero-shot cross-lingual transfer of vision-language models

PY Huang, M Patrick, J Hu, G Neubig, F Metze… - arxiv preprint arxiv …, 2021 - arxiv.org

This paper studies zero-shot cross-lingual transfer of vision-language models. Specifically,
we focus on multilingual text-to-video search and propose a Transformer-based model that …

Spremi Citiraj Spominje se 67 puta Srodni članci Svih 4 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mural: multimodal, multitask retrieval across languages

A Jain, M Guo, K Srinivasan, T Chen… - arxiv preprint arxiv …, 2021 - arxiv.org

Both image-caption pairs and translation pairs provide the means to learn deep
representations of and connections between languages. We use both types of pairs in …

Spremi Citiraj Spominje se 58 puta Srodni članci Svih 3 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Text to image generation: leaving no language behind

P Reviriego, E Merino-Gómez - arxiv preprint arxiv:2208.09333, 2022 - arxiv.org

One of the latest applications of Artificial Intelligence (AI) is to generate images from natural
language descriptions. These generators are now becoming available and achieve …

Spremi Citiraj Spominje se 31 puta Srodni članci Svih 2 inačica Prikaži kao HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-lingual cross-modal retrieval with noise-robust learning

Y Wang, J Dong, T Liang, M Zhang, R Cai… - Proceedings of the 30th …, 2022 - dl.acm.org

Despite the recent developments in the field of cross-modal retrieval, there has been less
research focusing on low-resource languages due to the lack of manually annotated …

Spremi Citiraj Spominje se 23 puta Srodni članci Svih 3 inačica

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Assessing multilingual fairness in pre-trained multimodal representations

J Wang, Y Liu, XE Wang - arxiv preprint arxiv:2106.06683, 2021 - arxiv.org

Recently pre-trained multimodal models, such as CLIP, have shown exceptional capabilities
towards connecting images and natural language. The textual representations in English …

Spremi Citiraj Spominje se 33 puta Srodni članci Svih 5 inačica Prikaži kao HTML

Stvori obavijest

Citiraj

Napredno pretraživanje

Spremljeno u Moju knjižnicu

Learning to scale multilingual representations for vision-language tasks

Teaching structured vision & language concepts to vision & language models

Lightningdot: Pre-training visual-semantic embeddings for real-time image-text retrieval

Vision-based real-time process monitoring and problem feedback for productivity-oriented analysis in off-site construction

M3p: Learning universal representations via multitask multilingual multimodal pre-training

Retrieve fast, rerank smart: Cooperative and joint approaches for improved cross-modal retrieval

Multilingual multimodal pre-training for zero-shot cross-lingual transfer of vision-language models

Mural: multimodal, multitask retrieval across languages

Text to image generation: leaving no language behind

Cross-lingual cross-modal retrieval with noise-robust learning

Assessing multilingual fairness in pre-trained multimodal representations