A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

A survey on hallucination in large vision-language models

H Liu, W Xue, Y Chen, D Chen, X Zhao, K Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent development of Large Vision-Language Models (LVLMs) has attracted growing
attention within the AI landscape for its practical implementation potential. However,`` …

A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions

L Huang, W Yu, W Ma, W Zhong, Z Feng… - ACM Transactions on …, 2025 - dl.acm.org
The emergence of large language models (LLMs) has marked a significant breakthrough in
natural language processing (NLP), fueling a paradigm shift in information acquisition …

Rlhf-v: Towards trustworthy mllms via behavior alignment from fine-grained correctional human feedback

T Yu, Y Yao, H Zhang, T He, Y Han… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multimodal Large Language Models (MLLMs) have recently demonstrated
impressive capabilities in multimodal understanding reasoning and interaction. However …

Mitigating object hallucinations in large vision-language models through visual contrastive decoding

S Leng, H Zhang, G Chen, X Li, S Lu… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Large Vision-Language Models (LVLMs) have advanced considerably intertwining
visual recognition and language understanding to generate content that is not only coherent …

Hallusionbench: an advanced diagnostic suite for entangled language hallucination and visual illusion in large vision-language models

T Guan, F Liu, X Wu, R **an, Z Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce" HallusionBench" a comprehensive benchmark designed for the evaluation of
image-context reasoning. This benchmark presents significant challenges to advanced large …

Opera: Alleviating hallucination in multi-modal large language models via over-trust penalty and retrospection-allocation

Q Huang, X Dong, P Zhang, B Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Hallucination posed as a pervasive challenge of multi-modal large language models
(MLLMs) has significantly impeded their real-world usage that demands precise judgment …

Internlm-xcomposer2: Mastering free-form text-image composition and comprehension in vision-language large model

X Dong, P Zhang, Y Zang, Y Cao, B Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce InternLM-XComposer2, a cutting-edge vision-language model excelling in free-
form text-image composition and comprehension. This model goes beyond conventional …

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T **ao, T He, Z Han, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

Hallucidoctor: Mitigating hallucinatory toxicity in visual instruction data

Q Yu, J Li, L Wei, L Pang, W Ye, B Qin… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Multi-modal Large Language Models (MLLMs) tuned on machine-generated
instruction-following data have demonstrated remarkable performance in various multimodal …