Foundation models for generalist medical artificial intelligence
The exceptionally rapid development of highly flexible, reusable artificial intelligence (AI)
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
models is likely to usher in newfound capabilities in medicine. We propose a new paradigm …
A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Segment anything
Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …
image segmentation. Using our efficient model in a data collection loop, we built the largest …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Segment anything in medical images
Medical image segmentation is a critical component in clinical practice, facilitating accurate
diagnosis, treatment planning, and disease monitoring. However, existing methods, often …
diagnosis, treatment planning, and disease monitoring. However, existing methods, often …
Image as a foreign language: Beit pretraining for vision and vision-language tasks
A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …
SpectralGPT: Spectral remote sensing foundation model
The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …
revolutionize the field of visual representation learning in a self-supervised manner. While …
Video-llava: Learning united visual representation by alignment before projection
The Large Vision-Language Model (LVLM) has enhanced the performance of various
downstream tasks in visual-language understanding. Most existing approaches encode …
downstream tasks in visual-language understanding. Most existing approaches encode …
Open-vocabulary panoptic segmentation with text-to-image diffusion models
We present ODISE: Open-vocabulary DIffusion-based panoptic SEgmentation, which unifies
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
pre-trained text-image diffusion and discriminative models to perform open-vocabulary …
A foundation model for clinical-grade computational pathology and rare cancers detection
The analysis of histopathology images with artificial intelligence aims to enable clinical
decision support systems and precision medicine. The success of such applications …
decision support systems and precision medicine. The success of such applications …