Google Académico

M Moor, O Banerjee, ZSH Abad, HM Krumholz… - Nature, 2023 - nature.com

The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …

Guardar Citar Citado por 1024 Artículos relacionados Las 17 versiones

[Free GPT-4]

[PDF] nowpublishers.com

Vision-language pre-training: Basics, recent advances, and future trends

Z Gan, L Li, C Li, L Wang, Z Liu… - Foundations and Trends …, 2022 - nowpublishers.com

This monograph surveys vision-language pre-training (VLP) methods for multimodal
intelligence that have been developed in the last few years. We group these approaches …

Guardar Citar Citado por 197 Artículos relacionados Las 7 versiones Búsqueda de bibliotecas Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Language is not all you need: Aligning perception with language models

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2023 - proceedings.neurips.cc

A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …

Guardar Citar Citado por 478 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Obelics: An open web-scale filtered dataset of interleaved image-text documents

H Laurençon, L Saulnier, L Tronchon… - Advances in …, 2024 - proceedings.neurips.cc

Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …

Guardar Citar Citado por 251 Artículos relacionados Las 5 versiones Versión en HTML

[Free GPT-4]

[PDF] arxiv.org

Kosmos-2: Grounding multimodal large language models to the world

Z Peng, W Wang, L Dong, Y Hao, S Huang… - ar** between their embedding spaces …

Guardar Citar Citado por 243 Artículos relacionados Las 8 versiones Versión en HTML

[Free GPT-4]

[PDF] umich.edu

Automated program repair in the era of large pre-trained language models

CS **a, Y Wei, L Zhang - 2023 IEEE/ACM 45th International …, 2023 - ieeexplore.ieee.org

Automated Program Repair (APR) aims to help developers automatically patch software
bugs. However, current state-of-the-art traditional and learning-based APR techniques face …

Guardar Citar Citado por 299 Artículos relacionados Las 9 versiones

[Free GPT-4]

[PDF] neurips.cc

Multimodal c4: An open, billion-scale corpus of images interleaved with text

W Zhu, J Hessel, A Awadalla… - Advances in …, 2024 - proceedings.neurips.cc

In-context vision and language models like Flamingo support arbitrarily interleaved
sequences of images and text as input. This format not only enables few-shot learning via …

Guardar Citar Citado por 158 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] mlr.press

Grounding language models to images for multimodal inputs and outputs

JY Koh, R Salakhutdinov… - … Conference on Machine …, 2023 - proceedings.mlr.press

We propose an efficient method to ground pretrained text-only language models to the
visual domain, enabling them to process arbitrarily interleaved image-and-text data, and …

Guardar Citar Citado por 192 Artículos relacionados Las 7 versiones Versión en HTML

[Free GPT-4]

[PDF] neurips.cc

Flamingo: a visual language model for few-shot learning

JB Alayrac, J Donahue, P Luc… - Advances in neural …, 2022 - proceedings.neurips.cc

Building models that can be rapidly adapted to novel tasks using only a handful of annotated
examples is an open challenge for multimodal machine learning research. We introduce …

Guardar Citar Citado por 3703 Artículos relacionados Las 7 versiones Versión en HTML

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Cm3: A causal masked multimodal model of the internet

Foundation models for generalist medical artificial intelligence

Vision-language pre-training: Basics, recent advances, and future trends

Language is not all you need: Aligning perception with language models

Obelics: An open web-scale filtered dataset of interleaved image-text documents

Kosmos-2: Grounding multimodal large language models to the world

Automated program repair in the era of large pre-trained language models

Multimodal c4: An open, billion-scale corpus of images interleaved with text

Grounding language models to images for multimodal inputs and outputs

Flamingo: a visual language model for few-shot learning