Mme-survey: A comprehensive survey on evaluation of multimodal llms

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian… - Visual Intelligence, 2024 - Springer
Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Investigating the quality of dermamnist and fitzpatrick17k dermatological image datasets

K Abhishek, A Jain, G Hamarneh - Scientific Data, 2025 - nature.com
The remarkable progress of deep learning in dermatological tasks has brought us closer to
achieving diagnostic accuracies comparable to those of human experts. However, while …

Bridging the reality gap: A benchmark for physical reasoning in general world models with various physical phenomena beyond mechanics

P Zhao, J Xu, N Cheng, H Hu, X Zhang, X Xu… - Expert Systems with …, 2025 - Elsevier
While general world models have demonstrated excellent capability in modeling and
simulating the world through video understanding and generation, their ability to reason …

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Y Zuo, S Qu, Y Li, Z Chen, X Zhu, E Hua… - arxiv preprint arxiv …, 2025 - arxiv.org
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

MS Sepehri, Z Fabian, M Soltanolkotabi… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

G Wang, L Bai, J Wang, K Yuan, Z Li, T Jiang… - arxiv preprint arxiv …, 2025 - arxiv.org
Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense
potential in computer-aided diagnosis and decision-making. In the context of robotic …

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering

J Xue, Q Deng, F Yu, Y Wang, J Wang, Y Li - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs), such as GPT-4o, Gemini, LLaVA, and
Flamingo, have made significant progress in integrating visual and textual modalities …