A Survey of Multimodel Large Language Models

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org
With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org
Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Llava-med: Training a large language-and-vision assistant for biomedicine in one day

C Li, C Wong, S Zhang, N Usuyama… - Advances in …, 2023 - proceedings.neurips.cc
Conversational generative AI has demonstrated remarkable promise for empowering
biomedical practitioners, but current investigations focus on unimodal text. Multimodal …

Towards generalist biomedical AI

T Tu, S Azizi, D Driess, M Schaekermann, M Amin… - Nejm Ai, 2024 - ai.nejm.org
Background Medicine is inherently multimodal, requiring the simultaneous interpretation
and integration of insights between many data modalities spanning text, imaging, genomics …

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord… - Advances in Neural …, 2025 - proceedings.neurips.cc
The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Quilt-1m: One million image-text pairs for histopathology

W Ikezogwo, S Seyfioglu, F Ghezloo… - Advances in neural …, 2023 - proceedings.neurips.cc
Recent accelerations in multi-modal applications have been made possible with the
plethora of image and text data available online. However, the scarcity of analogous data in …

A generalist vision–language foundation model for diverse biomedical tasks

K Zhang, R Zhou, E Adhikarla, Z Yan, Y Liu, J Yu… - Nature Medicine, 2024 - nature.com
Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or
modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize …

A survey of large language models in medicine: Progress, application, and challenge

H Zhou, F Liu, B Gu, X Zou, J Huang, J Wu, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs), such as ChatGPT, have received substantial attention due
to their capabilities for understanding and generating human language. While there has …

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

MS Sepehri, Z Fabian, M Soltanolkotabi… - arxiv preprint arxiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …