Contextual augmented global contrast for multimodal intent recognition

K Sun, Z **e, M Ye, H Zhang - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
Multimodal intent recognition (MIR) aims to perceive the human intent polarity via language
visual and acoustic modalities. The inherent intent ambiguity makes it challenging to …

X-Former: Unifying Contrastive and Reconstruction Learning for MLLMs

S Swetha, J Yang, T Neiman, MN Rizve, S Tran… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Multimodal Large Language Models (MLLMs) have revolutionized
the field of vision-language understanding by integrating visual perception capabilities into …

A Survey of Multimodal Composite Editing and Retrieval

S Li, F Huang, L Zhang - arxiv preprint arxiv:2409.05405, 2024 - arxiv.org
In the real world, where information is abundant and diverse across different modalities,
understanding and utilizing various data types to improve retrieval systems is a key focus of …

X-Former: Unifying contrastive and reconstruction learning for MLLMs

S Sirnam, J Yang, T Neiman, MN Rizve, S Tran… - … on Computer Vision, 2024 - Springer
Abstract Recent advancements in Multimodal Large Language Models (MLLMs) have
revolutionized the field of vision-language understanding by integrating visual perception …