- Academic Search

Z Liang, Y Xu, Y Hong, P Shang, Q Wang… - Proceedings of the 3rd …, 2024 - dl.acm.org

With the widespread application of the Transformer architecture in various modalities,
including vision, the technology of large language models is evolving from a single modality …

Opslaan Citeren Geciteerd door 1224 Verwante artikelen Alle 12 versies

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

PP Liang, A Zadeh, LP Morency - ACM Computing Surveys, 2024 - dl.acm.org

Multimodal machine learning is a vibrant multi-disciplinary research field that aims to design
computer agents with intelligent capabilities such as understanding, reasoning, and learning …

Opslaan Citeren Geciteerd door 92 Verwante artikelen

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Llava-med: Training a large language-and-vision assistant for biomedicine in one day

C Li, C Wong, S Zhang, N Usuyama… - Advances in …, 2023 - proceedings.neurips.cc

Conversational generative AI has demonstrated remarkable promise for empowering
biomedical practitioners, but current investigations focus on unimodal text. Multimodal …

Opslaan Citeren Geciteerd door 649 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards generalist biomedical AI

T Tu, S Azizi, D Driess, M Schaekermann, M Amin… - Nejm Ai, 2024 - ai.nejm.org

Background Medicine is inherently multimodal, requiring the simultaneous interpretation
and integration of insights between many data modalities spanning text, imaging, genomics …

Opslaan Citeren Geciteerd door 337 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord… - Advances in Neural …, 2025 - proceedings.neurips.cc

The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

Opslaan Citeren Geciteerd door 159 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

S Tong, E Brown, P Wu, S Woo, M Middepogu… - arxiv preprint arxiv …, 2024 - arxiv.org

We introduce Cambrian-1, a family of multimodal LLMs (MLLMs) designed with a vision-
centric approach. While stronger language models can enhance multimodal capabilities, the …

Opslaan Citeren Geciteerd door 205 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Quilt-1m: One million image-text pairs for histopathology

W Ikezogwo, S Seyfioglu, F Ghezloo… - Advances in neural …, 2023 - proceedings.neurips.cc

Recent accelerations in multi-modal applications have been made possible with the
plethora of image and text data available online. However, the scarcity of analogous data in …

Opslaan Citeren Geciteerd door 104 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A generalist vision–language foundation model for diverse biomedical tasks

K Zhang, R Zhou, E Adhikarla, Z Yan, Y Liu, J Yu… - Nature Medicine, 2024 - nature.com

Traditional biomedical artificial intelligence (AI) models, designed for specific tasks or
modalities, often exhibit limited flexibility in real-world deployment and struggle to utilize …

Opslaan Citeren Geciteerd door 74 Verwante artikelen Alle 6 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A survey of large language models in medicine: Progress, application, and challenge

H Zhou, F Liu, B Gu, X Zou, J Huang, J Wu, Y Li… - arxiv preprint arxiv …, 2023 - arxiv.org

Large language models (LLMs), such as ChatGPT, have received substantial attention due
to their capabilities for understanding and generating human language. While there has …

Opslaan Citeren Geciteerd door 108 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models

MS Sepehri, Z Fabian, M Soltanolkotabi… - arxiv preprint arxiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have tremendous potential to improve the
accuracy, availability, and cost-effectiveness of healthcare by providing automated solutions …

Opslaan Citeren Geciteerd door 94 Verwante artikelen Alle 5 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Pathvqa: 30000+ questions for medical visual question answering

A Survey of Multimodel Large Language Models

Foundations & trends in multimodal machine learning: Principles, challenges, and open questions

Llava-med: Training a large language-and-vision assistant for biomedicine in one day

Towards generalist biomedical AI

What matters when building vision-language models?

Cambrian-1: A fully open, vision-centric exploration of multimodal llms

Quilt-1m: One million image-text pairs for histopathology

A generalist vision–language foundation model for diverse biomedical tasks

A survey of large language models in medicine: Progress, application, and challenge

MediConfusion: Can you trust your AI radiologist? Probing the reliability of multimodal medical foundation models