Teochat: A large vision-language assistant for temporal earth observation data

JA Irvin, ER Liu, JC Chen, I Dormoy, J Kim… - arxiv preprint arxiv …, 2024 - arxiv.org
Large vision and language assistants have enabled new capabilities for interpreting natural
images. These approaches have recently been adapted to earth observation data, but they …

Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability

D Shu, H Zhao, J Hu, W Liu, L Cheng, M Du - arxiv preprint arxiv …, 2025 - arxiv.org
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in
processing both visual and textual information. However, the critical challenge of alignment …

ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models

J Chen, T Zhang, S Huang, Y Niu, L Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the recent breakthroughs achieved by Large Vision Language Models (LVLMs) in
understanding and responding to complex visual-textual contexts, their inherent …

Safe+ Safe= Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models

C Cui, G Deng, A Zhang, J Zheng, Y Li, L Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong
reasoning abilities across multiple modalities, achieving significant breakthroughs in various …

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

WY Choong, Y Guo, M Kankanhalli - arxiv preprint arxiv:2411.16771, 2024 - arxiv.org
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …

Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal

Y Wang, Z Zhu, H Liu, Y Liao, H Liu, Y Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs) excel at multimodal perception and
understanding, yet their tendency to generate hallucinated or inaccurate responses …

[CITATION][C] 大规模视觉-语言模型的对齐与不对齐: 从可解释性的视角进行的调查

D Shu, H Zhao, J Hu, W Liu, L Cheng, M Du