Explainable and interpretable multimodal large language models: A comprehensive survey

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

Towards universality: Studying mechanistic similarity across language model architectures

J Wang, X Ge, W Shu, Q Tang, Y Zhou, Z He… - arxiv preprint arxiv …, 2024 - arxiv.org
The hypothesis of Universality in interpretability suggests that different neural networks may
converge to implement similar algorithms on similar tasks. In this work, we investigate two …

Comparing Bottom-Up and Top-Down Steering Approaches on In-Context Learning Tasks

M Brumley, J Kwon, D Krueger… - arxiv preprint arxiv …, 2024 - arxiv.org
A key objective of interpretability research on large language models (LLMs) is to develop
methods for robustly steering models toward desired behaviors. To this end, two distinct …

LLMs can see and hear without any training

K Ashutosh, Y Gandelsman, X Chen, I Misra… - arxiv preprint arxiv …, 2025 - arxiv.org
We present MILS: Multimodal Iterative LLM Solver, a surprisingly simple, training-free
approach, to imbue multimodal capabilities into your favorite LLM. Leveraging their innate …

GIFT: A Framework for Global Interpretable Faithful Textual Explanations of Vision Classifiers

É Zablocki, V Gerard, A Cardiel, E Gaussier… - arxiv preprint arxiv …, 2024 - arxiv.org
Understanding deep models is crucial for deploying them in safety-critical applications. We
introduce GIFT, a framework for deriving post-hoc, global, interpretable, and faithful textual …