A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Contrastive representation learning: A framework and review

PH Le-Khac, G Healy, AF Smeaton - Ieee Access, 2020 - ieeexplore.ieee.org
Contrastive Learning has recently received interest due to its success in self-supervised
representation learning in the computer vision domain. However, the origins of Contrastive …

Masked autoencoders are scalable vision learners

K He, X Chen, S **e, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Masked autoencoders as spatiotemporal learners

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022 - proceedings.neurips.cc
This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …

Momentum contrast for unsupervised visual representation learning

K He, H Fan, Y Wu, S **e… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Abstract We present Momentum Contrast (MoCo) for unsupervised visual representation
learning. From a perspective on contrastive learning as dictionary look-up, we build a …

Unsupervised learning of visual features by contrasting cluster assignments

M Caron, I Misra, J Mairal, P Goyal… - Advances in neural …, 2020 - proceedings.neurips.cc
Unsupervised image representations have significantly reduced the gap with supervised
pretraining, notably with the recent achievements of contrastive learning methods. These …

[PDF][PDF] Electra: Pre-training text encoders as discriminators rather than generators

K Clark - arxiv preprint arxiv:2003.10555, 2020 - academia.edu
Masked language modeling (MLM) pre-training methods such as BERT corrupt the input by
replacing some tokens with [MASK] and then train a model to reconstruct the original tokens …

Representation learning with contrastive predictive coding

A Oord, Y Li, O Vinyals - arxiv preprint arxiv:1807.03748, 2018 - arxiv.org
While supervised learning has enabled great progress in many applications, unsupervised
learning has not seen such widespread adoption, and remains an important and …

Contrastive multiview coding

Y Tian, D Krishnan, P Isola - Computer Vision–ECCV 2020: 16th European …, 2020 - Springer
Humans view the world through many sensory channels, eg, the long-wavelength light
channel, viewed by the left eye, or the high-frequency vibrations channel, heard by the right …