Explainable and interpretable multimodal large language models: A comprehensive survey

Y Dang, K Huang, J Huo, Y Yan, S Huang, D Liu… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of Artificial Intelligence (AI) has revolutionized numerous fields, with
large language models (LLMs) and computer vision (CV) systems driving advancements in …

MultiEYE: Dataset and Benchmark for OCT-Enhanced Retinal Disease Recognition from Fundus Images

L Wang, C Qi, C Ou, L An, M **… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Existing multi-modal learning methods on fundus and OCT images mostly require both
modalities to be available and strictly paired for training and testing, which appears less …

Large Language Model with Region-guided Referring and Grounding for CT Report Generation

Z Chen, Y Bie, H **, H Chen - arxiv preprint arxiv:2411.15539, 2024 - arxiv.org
Computed tomography (CT) report generation is crucial to assist radiologists in interpreting
CT volumes, which can be time-consuming and labor-intensive. Existing methods primarily …