Explainable and interpretable multimodal large language models: A comprehensive survey

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment

P Khayatan, M Shukor, J Parekh, M Cord - arxiv preprint arxiv:2501.03012, 2025 - arxiv.org
Multimodal LLMs have reached remarkable levels of proficiency in understanding
multimodal inputs, driving extensive research to develop increasingly powerful models …

SAVEn-Vid: Synergistic Audio-Visual Integration for Enhanced Understanding in Long Video Context

J Li, S Tao, Y Yan, X Gu, H Xu, X Zheng, Y Lyu… - arxiv preprint arxiv …, 2024 - arxiv.org
Endeavors have been made to explore Large Language Models for video analysis (Video-
LLMs), particularly in understanding and interpreting long videos. However, existing Video …