A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
A comprehensive survey on test-time adaptation under distribution shifts
Abstract Machine learning methods strive to acquire a robust model during the training
process that can effectively generalize to test samples, even in the presence of distribution …
process that can effectively generalize to test samples, even in the presence of distribution …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
Videomae v2: Scaling video masked autoencoders with dual masking
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …
generalize to a variety of downstream tasks. However, it is still challenging to train video …
Scaling language-image pre-training via masking
Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …
method for training CLIP. Our method randomly masks out and removes a large portion of …
Self-supervised learning from images with a joint-embedding predictive architecture
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners
Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …
representations from limited training samples. Recently, CLIP-based methods have shown …
Gligen: Open-set grounded text-to-image generation
Large-scale text-to-image diffusion models have made amazing advances. However, the
status quo is to use text input alone, which can impede controllability. In this work, we …
status quo is to use text input alone, which can impede controllability. In this work, we …
Sequential modeling enables scalable learning for large vision models
We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …
Model (LVM) without making use of any linguistic data. To do this we define a common …
Self-supervised learning for medical image classification: a systematic review and implementation guidelines
Advancements in deep learning and computer vision provide promising solutions for
medical image analysis, potentially improving healthcare and patient outcomes. However …
medical image analysis, potentially improving healthcare and patient outcomes. However …