Teochat: A large vision-language assistant for temporal earth observation data
Large vision and language assistants have enabled new capabilities for interpreting natural
images. These approaches have recently been adapted to earth observation data, but they …
images. These approaches have recently been adapted to earth observation data, but they …
Large Vision-Language Model Alignment and Misalignment: A Survey Through the Lens of Explainability
Large Vision-Language Models (LVLMs) have demonstrated remarkable capabilities in
processing both visual and textual information. However, the critical challenge of alignment …
processing both visual and textual information. However, the critical challenge of alignment …
ICT: Image-Object Cross-Level Trusted Intervention for Mitigating Object Hallucination in Large Vision-Language Models
Despite the recent breakthroughs achieved by Large Vision Language Models (LVLMs) in
understanding and responding to complex visual-textual contexts, their inherent …
understanding and responding to complex visual-textual contexts, their inherent …
Safe+ Safe= Unsafe? Exploring How Safe Images Can Be Exploited to Jailbreak Large Vision-Language Models
Recent advances in Large Vision-Language Models (LVLMs) have showcased strong
reasoning abilities across multiple modalities, achieving significant breakthroughs in various …
reasoning abilities across multiple modalities, achieving significant breakthroughs in various …
VidHal: Benchmarking Temporal Hallucinations in Vision LLMs
Vision Large Language Models (VLLMs) are widely acknowledged to be prone to
hallucination. Existing research addressing this problem has primarily been confined to …
hallucination. Existing research addressing this problem has primarily been confined to …
Drawing the Line: Enhancing Trustworthiness of MLLMs Through the Power of Refusal
Multimodal large language models (MLLMs) excel at multimodal perception and
understanding, yet their tendency to generate hallucinated or inaccurate responses …
understanding, yet their tendency to generate hallucinated or inaccurate responses …