Order matters: Exploring order sensitivity in multimodal large language models

Z Tan, X Chu, W Li, T Mo - arxiv preprint arxiv:2410.16983, 2024‏ - arxiv.org
Multimodal Large Language Models (MLLMs) utilize multimodal contexts consisting of text,
images, or videos to solve various multimodal tasks. However, we find that changing the …

Chimera: Improving generalist model with domain-specific experts

T Peng, M Li, H Zhou, R **a, R Zhang, L Bai… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Recent advancements in Large Multi-modal Models (LMMs) underscore the importance of
scaling by increasing image-text paired data, achieving impressive performance on general …