Explainable and interpretable multimodal large language models: A comprehensive survey

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Reefknot: A comprehensive benchmark for relation hallucination evaluation, analysis and mitigation in multimodal large language models

K Zheng, J Chen, Y Yan, X Zou, X Hu - arxiv preprint arxiv:2408.09429, 2024 - arxiv.org
Hallucination issues persistently plagued current multimodal large language models
(MLLMs). While existing research primarily focuses on object-level or attribute-level …

Miner: Mining the underlying pattern of modality-specific neurons in multimodal large language models

K Huang, J Huo, Y Yan, K Wang, Y Yue… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, multimodal large language models (MLLMs) have significantly advanced,
integrating more modalities into diverse applications. However, the lack of explainability …

Mitigating modality prior-induced hallucinations in multimodal large language models via deciphering attention causality

G Zhou, Y Yan, X Zou, K Wang, A Liu, X Hu - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have emerged as a central focus in both
industry and academia, but often suffer from biases introduced by visual and language …

Errorradar: Benchmarking complex mathematical reasoning of multimodal large language models via error detection

Y Yan, S Wang, J Huo, H Li, B Li, J Su, X Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
As the field of Multimodal Large Language Models (MLLMs) continues to evolve, their
potential to revolutionize artificial intelligence is particularly promising, especially in …

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …

Performance Gap in Entity Knowledge Extraction Across Modalities in Vision Language Models

I Cohen, D Gottesman, M Geva, R Giryes - arxiv preprint arxiv:2412.14133, 2024 - arxiv.org
Vision-language models (VLMs) excel at extracting and reasoning about information from
images. Yet, their capacity to leverage internal knowledge about specific entities remains …