Uniir: Training and benchmarking universal multimodal information retrievers

C Wei, Y Chen, H Chen, H Hu, G Zhang, J Fu… - … on Computer Vision, 2024 - Springer
Existing information retrieval (IR) models often assume a homogeneous format, limiting their
applicability to diverse user needs, such as searching for images with text descriptions …

Mm-embed: Universal multimodal retrieval with multimodal llms

SC Lin, C Lee, M Shoeybi, J Lin, B Catanzaro… - arxiv preprint arxiv …, 2024 - arxiv.org
State-of-the-art retrieval models typically address a straightforward search scenario, where
retrieval tasks are fixed (eg, finding a passage to answer a specific question) and only a …

EntityCLIP: Entity-Centric Image-Text Matching via Multimodal Attentive Contrastive Learning

Y Wang, L Wu, L Cheng, Z Zhong, M Wang - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in image-text matching have been notable, yet prevailing models
predominantly cater to broad queries and struggle with accommodating fine-grained query …

CUE-M: Contextual Understanding and Enhanced Search with Multimodal Large Language Model

D Go, T Whang, C Lee, H Kim, S Park, S Ji… - arxiv preprint arxiv …, 2024 - arxiv.org
The integration of Retrieval-Augmented Generation (RAG) with Multimodal Large Language
Models (MLLMs) has expanded the scope of multimodal query resolution. However, current …

Universal Multimodal Retrieval with Multimodal LLMs

SC Lin, C Lee, M Shoeybi, J Lin, B Catanzaro… - … Conference on Learning … - openreview.net
State-of-the-art retrieval models typically address a straightforward search scenario, where
retrieval tasks are fixed (eg, finding a passage to answer a specific question) and only a …

[PDF][PDF] MIRACLE: Multimodal Image-text Retrieval and Analysis for Contextual Long-form Evaluation

MMM Miah, A Chatterjee, A Mitra, R Huang, M Luo - marworkshop.github.io
Multimodal retrieval is becoming increasingly vital as media platforms often feature content
combining text and images. This is especially prevalent in long-form, information-rich …