Google Tudós

K Sun, Z **e, M Ye, H Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …

Mentés Hivatkozás Idézetek száma: 4 Kapcsolódó cikkek Mind a(z) 5 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

S Swetha, J Yang, T Neiman, MN Rizve, S Tran… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized
the field of vision-language understanding by integrating visual perception capabilities into …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 4 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A Survey of Multimodal Composite Editing and Retrieval

S Li, F Huang, L Zhang - arxiv preprint arxiv:2409.05405, 2024 - arxiv.org

In the real world, where information is abundant and diverse across different modalities,
understanding and utilizing various data types to improve retrieval systems is a key focus of …

Mentés Hivatkozás Idézetek száma: 2 Kapcsolódó cikkek Mind a(z) 2 változat HTML-változat

[Free GPT-4]
[DeepSeek]

[PDF] amazon.science

X-Former: Unifying contrastive and reconstruction learning for MLLMs

S Sirnam, J Yang, T Neiman, MN Rizve, S Tran… - … on Computer Vision, 2024 - Springer

Abstract Recent advancements in Multimodal Large Language Models (MLLMs) have
revolutionized the field of vision-language understanding by integrating visual perception …

Mentés Hivatkozás Kapcsolódó cikkek Mind a(z) 6 változat

Értesítés létrehozása

Hivatkozás

Speciális keresés

Mentve a Saját könyvtárba

Provla: Compositional image search with progressive vision-language alignment and multimodal fusion

Contextual augmented global contrast for multimodal intent recognition

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

A Survey of Multimodal Composite Editing and Retrieval

X-Former: Unifying contrastive and reconstruction learning for MLLMs