Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Y Wang, M Zhang, J Sun, C Wang, M Yang… - arxiv preprint arxiv …, 2025 - arxiv.org
Fusing visual understanding into language generation, Multi-modal Large Language
Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are …

Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models

J Liu, Y Li, B **ao, Y Jian, Z Qin, T Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
There have been recent efforts to extend the Chain-of-Thought (CoT) paradigm to
Multimodal Large Language Models (MLLMs) by finding visual clues in the input scene …

Evaluating Vision-Language Models as Evaluators in Path Planning

M Aghzal, X Yue, E Plaku, Z Yao - arxiv preprint arxiv:2411.18711, 2024 - arxiv.org
Despite their promise to perform complex reasoning, large language models (LLMs) have
been shown to have limited effectiveness in end-to-end planning. This has inspired an …

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

Z Zhang, F Hu, J Lee, F Shi, P Kordjamshidi… - arxiv preprint arxiv …, 2024 - arxiv.org
Spatial expressions in situated communication can be ambiguous, as their meanings vary
depending on the frames of reference (FoR) adopted by speakers and listeners. While …

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

WY Choong, Y Guo, M Kankanhalli - arxiv preprint arxiv:2411.16771, 2024 - arxiv.org
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …