- Academic Search

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Save Cite Cited by 601 Related articles All 2 versions Free GPT-4

Save Cite Cited by 8208 Related articles All 12 versions Free GPT-4 View as HTML

Segment anything

A Kirillov, E Mintun, N Ravi, H Mao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We introduce the Segment Anything (SA) project: a new task, model, and dataset for
image segmentation. Using our efficient model in a data collection loop, we built the largest …

Save Cite Cited by 2591 Related articles All 3 versions Free GPT-4 View as HTML

YOLOv6: A single-stage object detection framework for industrial applications

C Li, L Li, H Jiang, K Weng, Y Geng, L Li, Z Ke… - arxiv preprint arxiv …, 2022 - arxiv.org

For years, the YOLO series has been the de facto industry-level standard for efficient object
detection. The YOLO community has prospered overwhelmingly to enrich its use in a …

Save Cite Cited by 6878 Related articles All 11 versions Free GPT-4 View as HTML

A convnet for the 2020s

Z Liu, H Mao, CY Wu, C Feichtenhofer… - Proceedings of the …, 2022 - openaccess.thecvf.com

The" Roaring 20s" of visual recognition began with the introduction of Vision Transformers
(ViTs), which quickly superseded ConvNets as the state-of-the-art image classification …

Save Cite Cited by 1424 Related articles All 7 versions Free GPT-4 View as HTML

Coca: Contrastive captioners are image-text foundation models

J Yu, Z Wang, V Vasudevan, L Yeung… - arxiv preprint arxiv …, 2022 - arxiv.org

Exploring large-scale pretrained foundation models is of significant interest in computer
vision because these models can be quickly transferred to many downstream tasks. This …

Save Cite Cited by 676 Related articles All 8 versions Free GPT-4 View as HTML

Convnext v2: Co-designing and scaling convnets with masked autoencoders

S Woo, S Debnath, R Hu, X Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Driven by improved architectures and better representation learning frameworks, the field of
visual recognition has enjoyed rapid modernization and performance boost in the early …

Save Cite Cited by 697 Related articles All 5 versions Free GPT-4 View as HTML

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B **e, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Save Cite Cited by 4661 Related articles All 2 versions Free GPT-4 View as HTML

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …