A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Self-supervised learning for medical image classification: a systematic review and implementation guidelines
Advancements in deep learning and computer vision provide promising solutions for
medical image analysis, potentially improving healthcare and patient outcomes. However …
medical image analysis, potentially improving healthcare and patient outcomes. However …
Dinov2: Learning robust visual features without supervision
The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …
quantities of data have opened the way for similar foundation models in computer vision …
SpectralGPT: Spectral remote sensing foundation model
The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …
revolutionize the field of visual representation learning in a self-supervised manner. While …
Convnext v2: Co-designing and scaling convnets with masked autoencoders
Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …
visual recognition has enjoyed rapid modernization and performance boost in the early …
Videomae v2: Scaling video masked autoencoders with dual masking
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …
generalize to a variety of downstream tasks. However, it is still challenging to train video …
Emergent correspondence from image diffusion
Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …
this paper, we show that correspondence emerges in image diffusion models without any …
[HTML][HTML] Deep learning in food category recognition
Integrating artificial intelligence with food category recognition has been a field of interest for
research for the past few decades. It is potentially one of the next steps in revolutionizing …
research for the past few decades. It is potentially one of the next steps in revolutionizing …
Revisiting class-incremental learning with pre-trained models: Generalizability and adaptivity are all you need
Class-incremental learning (CIL) aims to adapt to emerging new classes without forgetting
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
old ones. Traditional CIL models are trained from scratch to continually acquire knowledge …
Self-supervised learning from images with a joint-embedding predictive architecture
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …