Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink
Fusing visual understanding into language generation, Multi-modal Large Language
Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are …
Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are …
Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models
J Liu, Y Li, B **ao, Y Jian, Z Qin, T Shao… - arxiv preprint arxiv …, 2024 - arxiv.org
There have been recent efforts to extend the Chain-of-Thought (CoT) paradigm to
Multimodal Large Language Models (MLLMs) by finding visual clues in the input scene …
Multimodal Large Language Models (MLLMs) by finding visual clues in the input scene …
Evaluating Vision-Language Models as Evaluators in Path Planning
Despite their promise to perform complex reasoning, large language models (LLMs) have
been shown to have limited effectiveness in end-to-end planning. This has inspired an …
been shown to have limited effectiveness in end-to-end planning. This has inspired an …
Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities
Spatial expressions in situated communication can be ambiguous, as their meanings vary
depending on the frames of reference (FoR) adopted by speakers and listeners. While …
depending on the frames of reference (FoR) adopted by speakers and listeners. While …
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …
hallucination. Existing research addressing this problem has primarily been confined to …