Explainable and interpretable multimodal large language models: A comprehensive survey
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …
large language models (LLMs) and computer vision (CV) systems driving advancements in …
Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models
Hallucination issues persistently plagued current multimodal large language models
(MLLMs). While existing research primarily focuses on object-level or attribute-level …
(MLLMs). While existing research primarily focuses on object-level or attribute-level …
Miner: Mining the underlying pattern of modality-specific neurons in multimodal large language models
In recent years, multimodal large language models (MLLMs) have significantly advanced,
integrating more modalities into diverse applications. However, the lack of explainability …
integrating more modalities into diverse applications. However, the lack of explainability …
Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality
Multimodal Large Language Models (MLLMs) have emerged as a central focus in both
industry and academia, but often suffer from biases introduced by visual and language …
industry and academia, but often suffer from biases introduced by visual and language …
Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …
potential to revolutionize artificial intelligence is particularly promising, especially in …
SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context
J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
LLMs), particularly in understanding and interpreting long videos. However, existing Video …
Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models
Vision-language models (VLMs) excel at extracting and reasoning about information from
images. Yet, their capacity to leverage internal knowledge about specific entities remains …
images. Yet, their capacity to leverage internal knowledge about specific entities remains …