A comprehensive survey on pretrained foundation models: A history from bert to chatgpt
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …
A review of location encoding for GeoAI: methods and applications
ABSTRACT A common need for artificial intelligence models in the broader geoscience is to
encode various types of spatial data, such as points, polylines, polygons, graphs, or rasters …
encode various types of spatial data, such as points, polylines, polygons, graphs, or rasters …
Vision-language models for vision tasks: A survey
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …
(DNNs) training, and they usually train a DNN for each single visual recognition task …
Eva-02: A visual representation for neon genesis
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …
to reconstruct strong and robust language-aligned vision features via masked image …
Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …
possibilities for multi-modal AGI systems. However the progress in vision and vision …
What does a platypus look like? generating customized prompts for zero-shot image classification
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …
traditional classification models, open-vocabulary models classify among any arbitrary set of …
Florence: A new foundation model for computer vision
Automated visual understanding of our diverse and open world demands computer vision
models to generalize well with minimal customization for specific tasks, similar to human …
models to generalize well with minimal customization for specific tasks, similar to human …
Is synthetic data from generative models ready for image recognition?
Recent text-to-image generation models have shown promising results in generating high-
fidelity photo-realistic images. Though the results are astonishing to human eyes, how …
fidelity photo-realistic images. Though the results are astonishing to human eyes, how …
With a little help from my friends: Nearest-neighbor contrastive learning of visual representations
Self-supervised learning algorithms based on instance discrimination train encoders to be
invariant to pre-defined transformations of the same instance. While most methods treat …
invariant to pre-defined transformations of the same instance. While most methods treat …
Fine-grained image analysis with deep learning: A survey
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …
vision and pattern recognition, and underpins a diverse set of real-world applications. The …