A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
Foundation Models Defining a New Era in Vision: a Survey and Outlook
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …
fundamental to understanding our world. The complex relations between objects and their …
SpectralGPT: Spectral remote sensing foundation model
The foundation model has recently garnered significant attention due to its potential to
revolutionize the field of visual representation learning in a self-supervised manner. While …
revolutionize the field of visual representation learning in a self-supervised manner. While …
Distilling step-by-step! outperforming larger language models with less training data and smaller model sizes
Deploying large language models (LLMs) is challenging because they are memory
inefficient and compute-intensive for practical applications. In reaction, researchers train …
inefficient and compute-intensive for practical applications. In reaction, researchers train …
A foundation model for generalizable disease detection from retinal images
Medical artificial intelligence (AI) offers great potential for recognizing signs of health
conditions in retinal images and expediting the diagnosis of eye diseases and systemic …
conditions in retinal images and expediting the diagnosis of eye diseases and systemic …
Learn to explain: Multimodal reasoning via thought chains for science question answering
When answering a question, humans utilize the information available across different
modalities to synthesize a consistent and complete chain of thought (CoT). This process is …
modalities to synthesize a consistent and complete chain of thought (CoT). This process is …
Self-supervised learning from images with a joint-embedding predictive architecture
This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …
Multimodal chain-of-thought reasoning in language models
Large language models (LLMs) have shown impressive performance on complex reasoning
by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains …
by leveraging chain-of-thought (CoT) prompting to generate intermediate reasoning chains …
Inner monologue: Embodied reasoning through planning with language models
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …
(LLMs) can be applied to domains beyond natural language processing, such as planning …
Multimodal foundation models: From specialists to general-purpose assistants
Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …
methods to data compression. Recent advances in statistical machine learning have opened …