Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T **ao, T He, Z Han, Z Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

Llafs: When large language models meet few-shot segmentation

L Zhu, T Chen, D Ji, J Ye, J Liu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
This paper proposes LLaFS the first attempt to leverage large language models (LLMs) in
few-shot segmentation. In contrast to the conventional few-shot segmentation methods that …

Controlmllm: Training-free visual prompt learning for multimodal large language models

M Wu, X Cai, J Ji, J Li, O Huang… - Advances in …, 2025 - proceedings.neurips.cc
In this work, we propose a training-free method to inject visual prompts into Multimodal
Large Language Models (MLLMs) through learnable latent variable optimization. We …

Model tailor: Mitigating catastrophic forgetting in multi-modal large language models

D Zhu, Z Sun, Z Li, T Shen, K Yan, S Ding… - arxiv preprint arxiv …, 2024 - arxiv.org
Catastrophic forgetting emerges as a critical challenge when fine-tuning multi-modal large
language models (MLLMs), where improving performance on unseen tasks often leads to a …

Agla: Mitigating object hallucinations in large vision-language models with assembly of global and local attention

W An, F Tian, S Leng, J Nie, H Lin, QY Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite their great success across various multimodal tasks, Large Vision-Language
Models (LVLMs) are facing a prevalent problem with object hallucinations, where the …

Sam2-adapter: Evaluating & adapting segment anything 2 in downstream tasks: Camouflage, shadow, medical image segmentation, and more

T Chen, A Lu, L Zhu, C Ding, C Yu, D Ji, Z Li… - arxiv preprint arxiv …, 2024 - arxiv.org
The advent of large models, also known as foundation models, has significantly transformed
the AI research landscape, with models like Segment Anything (SAM) achieving notable …

Alleviating hallucination in large vision-language models with active retrieval augmentation

X Qu, Q Chen, W Wei, J Sun, J Dong - arxiv preprint arxiv:2408.00555, 2024 - arxiv.org
Despite the remarkable ability of large vision-language models (LVLMs) in image
comprehension, these models frequently generate plausible yet factually incorrect …

Discrete latent perspective learning for segmentation and detection

D Ji, F Zhao, L Zhu, W **, H Lu, J Ye - arxiv preprint arxiv:2406.10475, 2024 - arxiv.org
In this paper, we address the challenge of Perspective-Invariant Learning in machine
learning and computer vision, which involves enabling a network to understand images from …

Eventhallusion: Diagnosing event hallucinations in video llms

J Zhang, Y Jiao, S Chen, J Chen, YG Jiang - arxiv preprint arxiv …, 2024 - arxiv.org
Recently, Multimodal Large Language Models (MLLMs) have made significant progress in
the video comprehension field. Despite remarkable content reasoning and instruction …