- Academic Search

M Wang, Y Yao, Z Xu, S Qiao, S Deng, P Wang… - ar** robust autonomous …

Enregistrer Citer Cité 1 fois Autres articles Version HTML

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Y Wang, M Zhang, J Sun, C Wang, M Yang… - arxiv preprint arxiv …, 2025 - arxiv.org

Fusing visual understanding into language generation, Multi-modal Large Language
Models (MLLMs) are revolutionizing visual-language applications. Yet, these models are …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models

J Liu, Y Li, B **ao, Y Jian, Z Qin, T Shao… - arxiv preprint arxiv …, 2024 - arxiv.org

There have been recent efforts to extend the Chain-of-Thought (CoT) paradigm to
Multimodal Large Language Models (MLLMs) by finding visual clues in the input scene …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Evaluating Vision-Language Models as Evaluators in Path Planning

M Aghzal, X Yue, E Plaku, Z Yao - arxiv preprint arxiv:2411.18711, 2024 - arxiv.org

Despite their promise to perform complex reasoning, large language models (LLMs) have
been shown to have limited effectiveness in end-to-end planning. This has inspired an …

Enregistrer Citer Autres articles Les 2 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

Z Zhang, F Hu, J Lee, F Shi, P Kordjamshidi… - arxiv preprint arxiv …, 2024 - arxiv.org

Spatial expressions in situated communication can be ambiguous, as their meanings vary
depending on the frames of reference (FoR) adopted by speakers and listeners. While …

Enregistrer Citer Autres articles Les 3 versions Free GPT-4 DeepSeek Version HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

WY Choong, Y Guo, M Kankanhalli - arxiv preprint arxiv:2411.16771, 2024 - arxiv.org

Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …

Enregistrer Citer Autres articles Version HTML

Créer l'alerte

Citer

Recherche avancée

Enregistré dans Ma bibliothèque

Multi-object hallucination in vision language models

Knowledge mechanisms in large language models: A survey and perspective

Mirage in the Eyes: Hallucination Attack on Multi-modal Large Language Models with Only Attention Sink

Enhancing Visual Reasoning with Autonomous Imagination in Multimodal Large Language Models

Evaluating Vision-Language Models as Evaluators in Path Planning

Do Vision-Language Models Represent Space and How? Evaluating Spatial Frame of Reference Under Ambiguities

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs