Mme-survey: A comprehensive survey on evaluation of multimodal llms
As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …
Models (MLLMs) have garnered increased attention from both industry and academia …
A survey on evaluation of multimodal large language models
J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …
system by integrating powerful Large Language Models (LLMs) with various modality …
Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance
Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …
vision-language tasks across a wide range of domains. However, the large model scale and …
A survey on multimodal benchmarks: In the era of large ai models
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …
advancements in artificial intelligence, significantly enhancing the capability to understand …
Investigating the quality of dermamnist and fitzpatrick17k dermatological image datasets
The remarkable progress of deep learning in dermatological tasks has brought us closer to
achieving diagnostic accuracies comparable to those of human experts. However, while …
achieving diagnostic accuracies comparable to those of human experts. However, while …
Bridging the reality gap: A benchmark for physical reasoning in general world models with various physical phenomena beyond mechanics
While general world models have demonstrated excellent capability in modeling and
simulating the world through video understanding and generation, their ability to reason …
simulating the world through video understanding and generation, their ability to reason …
MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding
We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …
MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models
Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …
EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery
Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense
potential in computer-aided diagnosis and decision-making. In the context of robotic …
potential in computer-aided diagnosis and decision-making. In the context of robotic …
Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering
J Xue, Q Deng, F Yu, Y Wang, J Wang, Y Li - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal large language models (MLLMs), such as GPT-4o, Gemini, LLaVA, and
Flamingo, have made significant progress in integrating visual and textual modalities …
Flamingo, have made significant progress in integrating visual and textual modalities …