A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Woodpecker: Hallucination correction for multimodal large language models

S Yin, C Fu, S Zhao, T Xu, H Wang, D Sui… - Science China …, 2024 - Springer
Hallucinations is a big shadow hanging over the rapidly evolving multimodal large language
models (MLLMs), referring to that the generated text is inconsistent with the image content …

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T **ao, T He, Z Han, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

MS Sepehri, Z Fabian, M Soltanolkotabi… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

Aligning modalities in vision large language models via preference fine-tuning

Y Zhou, C Cui, R Rafailov, C Finn, H Yao - arxiv preprint arxiv:2402.11411, 2024 - arxiv.org
Instruction-following Vision Large Language Models (VLLMs) have achieved significant
progress recently on a variety of tasks. These approaches merge strong pre-trained vision …

Halc: Object hallucination reduction via adaptive focal-contrast decoding

Z Chen, Z Zhao, H Luo, H Yao, B Li, J Zhou - arxiv preprint arxiv …, 2024 - arxiv.org
While large vision-language models (LVLMs) have demonstrated impressive capabilities in
interpreting multi-modal contexts, they invariably suffer from object hallucinations (OH). We …