A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

SpectralGPT: Spectral remote sensing foundation model

D Hong, B Zhang, X Li, Y Li, C Li, J Yao… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …

Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes

CY Hsieh, CL Li, CK Yeh, H Nakhost, Y Fujii… - arxiv preprint arxiv …, 2023 - arxiv.org
Deploying large language models (LLMs) is challenging because they are memory
inefficient and compute-intensive for practical applications. In reaction, researchers train …

A foundation model for generalizable disease detection from retinal images

Y Zhou, MA Chia, SK Wagner, MS Ayhan… - Nature, 2023 - nature.com
Medical artificial intelligence (AI) offers great potential for recognizing signs of health
conditions in retinal images and expediting the diagnosis of eye diseases and systemic …

Learn to explain: Multimodal reasoning via thought chains for science question answering

P Lu, S Mishra, T **a, L Qiu… - Advances in …, 2022 - proceedings.neurips.cc
When answering a question, humans utilize the information available across different
modalities to synthesize a consistent and complete chain of thought (CoT). This process is …

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

Multimodal chain-of-thought reasoning in language models

Z Zhang, A Zhang, M Li, H Zhao, G Karypis… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) have shown impressive performance on complex reasoning
by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains …

Inner monologue: Embodied reasoning through planning with language models

W Huang, F **a, T **ao, H Chan, J Liang… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …