Datasets for large language models: A comprehensive survey

Y Liu, J Cao, C Liu, K Ding, L ** - arxiv preprint arxiv:2402.18041, 2024 - arxiv.org
This paper embarks on an exploration into the Large Language Model (LLM) datasets,
which play a crucial role in the remarkable advancements of LLMs. The datasets serve as …

Mantis: Interleaved multi-image instruction tuning

D Jiang, X He, H Zeng, C Wei, M Ku, Q Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large multimodal models (LMMs) have shown great results in single-image vision language
tasks. However, their abilities to solve multi-image visual language tasks is yet to be …

Mllm-as-a-judge: Assessing multimodal llm-as-a-judge with vision-language benchmark

D Chen, R Chen, S Zhang, Y Wang, Y Liu… - … on Machine Learning, 2024 - openreview.net
Multimodal Large Language Models (MLLMs) have gained significant attention recently,
showing remarkable potential in artificial general intelligence. However, assessing the utility …

Embodied multi-modal agent trained by an llm from a parallel textworld

Y Yang, T Zhou, K Li, D Tao, L Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
While large language models (LLMs) excel in a simulated world of texts they struggle to
interact with the more realistic world without perceptions of other modalities such as visual or …

[HTML][HTML] Multimodal large language models in health care: applications, challenges, and future outlook

R AlSaad, A Abd-Alrazaq, S Boughorbel… - Journal of medical …, 2024 - jmir.org
In the complex and multidimensional field of medicine, multimodal data are prevalent and
crucial for informed clinical decisions. Multimodal data span a broad spectrum of data types …

Expanding performance boundaries of open-source multimodal models with model, data, and test-time scaling

Z Chen, W Wang, Y Cao, Y Liu, Z Gao, E Cui… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce InternVL 2.5, an advanced multimodal large language model (MLLM) series
that builds upon InternVL 2.0, maintaining its core model architecture while introducing …

Halc: Object hallucination reduction via adaptive focal-contrast decoding

Z Chen, Z Zhao, H Luo, H Yao, B Li, J Zhou - arxiv preprint arxiv …, 2024 - arxiv.org
While large vision-language models (LVLMs) have demonstrated impressive capabilities in
interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We …

Vhelm: A holistic evaluation of vision language models

T Lee, H Tu, CH Wong, W Zheng… - Advances in …, 2025 - proceedings.neurips.cc
Current benchmarks for assessing vision-language models (VLMs) often focus on their
perception or problem-solving capabilities and neglect other critical aspects such as …

Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Mitigating object hallucination via concentric causal attention

Y **ng, Y Li, I Laptev, S Lu - Advances in Neural …, 2025 - proceedings.neurips.cc
Abstract Recent Large Vision Language Models (LVLMs) present remarkable zero-shot
conversational and reasoning capabilities given multimodal queries. Nevertheless, they …