Google Académico

C Fu, YF Zhang, S Yin, B Li, X Fang, S Zhao… - arxiv preprint arxiv …, 2024 - arxiv.org

As a prominent direction of Artificial General Intelligence (AGI), Multimodal Large Language
Models (MLLMs) have garnered increased attention from both industry and academia …

Guardar Citar Citado por 5 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

A survey on evaluation of multimodal large language models

J Huang, J Zhang - arxiv preprint arxiv:2408.15769, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) mimic human perception and reasoning
system by integrating powerful Large Language Models (LLMs) with various modality …

Guardar Citar Citado por 17 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] springer.com

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

Z Gao, Z Chen, E Cui, Y Ren, W Wang, J Zhu, H Tian… - Visual Intelligence, 2024 - Springer

Multi-modal large language models (MLLMs) have demonstrated impressive performance in
vision-language tasks across a wide range of domains. However, the large model scale and …

Guardar Citar Citado por 8 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J **ao, L Chen - arxiv preprint arxiv:2409.18142, 2024 - arxiv.org

The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

Guardar Citar Citado por 3 Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] nature.com

Investigating the quality of dermamnist and fitzpatrick17k dermatological image datasets

K Abhishek, A Jain, G Hamarneh - Scientific Data, 2025 - nature.com

The remarkable progress of deep learning in dermatological tasks has brought us closer to
achieving diagnostic accuracies comparable to those of human experts. However, while …

Guardar Citar Citado por 2 Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] openreview.net

Bridging the reality gap: A benchmark for physical reasoning in general world models with various physical phenomena beyond mechanics

P Zhao, J Xu, N Cheng, H Hu, X Zhang, X Xu… - Expert Systems with …, 2025 - Elsevier

While general world models have demonstrated excellent capability in modeling and
simulating the world through video understanding and generation, their ability to reason …

Guardar Citar Artículos relacionados Las 2 versiones

[Free GPT-4]

[PDF] arxiv.org

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

Y Zuo, S Qu, Y Li, Z Chen, X Zhu, E Hua… - arxiv preprint arxiv …, 2025 - arxiv.org

We introduce MedXpertQA, a highly challenging and comprehensive benchmark to evaluate
expert-level medical knowledge and advanced reasoning. MedXpertQA includes 4,460 …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

MS Sepehri, Z Fabian, M Soltanolkotabi… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …

Guardar Citar Artículos relacionados Las 3 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

G Wang, L Bai, J Wang, K Yuan, Z Li, T Jiang… - arxiv preprint arxiv …, 2025 - arxiv.org

Recently, Multimodal Large Language Models (MLLMs) have demonstrated their immense
potential in computer-aided diagnosis and decision-making. In the context of robotic …

Guardar Citar Artículos relacionados Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering

J Xue, Q Deng, F Yu, Y Wang, J Wang, Y Li - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal large language models (MLLMs), such as GPT-4o, Gemini, LLaVA, and
Flamingo, have made significant progress in integrating visual and textual modalities …

Guardar Citar Artículos relacionados Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Gmai-mmbench: A comprehensive multimodal evaluation benchmark towards general medical ai

Mme-survey: A comprehensive survey on evaluation of multimodal llms

A survey on evaluation of multimodal large language models

Mini-InternVL: a flexible-transfer pocket multi-modal model with 5% parameters and 90% performance

A survey on multimodal benchmarks: In the era of large ai models

Investigating the quality of dermamnist and fitzpatrick17k dermatological image datasets

Bridging the reality gap: A benchmark for physical reasoning in general world models with various physical phenomena beyond mechanics

MedXpertQA: Benchmarking Expert-Level Medical Reasoning and Understanding

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

EndoChat: Grounded Multimodal Large Language Model for Endoscopic Surgery

Enhanced Multimodal RAG-LLM for Accurate Visual Question Answering