A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer
Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

A review of location encoding for GeoAI: methods and applications

G Mai, K Janowicz, Y Hu, S Gao, B Yan… - International Journal …, 2022 - Taylor & Francis
ABSTRACT A common need for artificial intelligence models in the broader geoscience is to
encode various types of spatial data, such as points, polylines, polygons, graphs, or rasters …

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S **, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

Eva-02: A visual representation for neon genesis

Y Fang, Q Sun, X Wang, T Huang, X Wang… - Image and Vision …, 2024 - Elsevier
We launch EVA-02, a next-generation Transformer-based visual representation pre-trained
to reconstruct strong and robust language-aligned vision features via masked image …

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

What does a platypus look like? generating customized prompts for zero-shot image classification

S Pratt, I Covert, R Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …

Florence: A new foundation model for computer vision

L Yuan, D Chen, YL Chen, N Codella, X Dai… - arxiv preprint arxiv …, 2021 - arxiv.org
Automated visual understanding of our diverse and open world demands computer vision
models to generalize well with minimal customization for specific tasks, similar to human …

Is synthetic data from generative models ready for image recognition?

R He, S Sun, X Yu, C Xue, W Zhang, P Torr… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent text-to-image generation models have shown promising results in generating high-
fidelity photo-realistic images. Though the results are astonishing to human eyes, how …

With a little help from my friends: Nearest-neighbor contrastive learning of visual representations

D Dwibedi, Y Aytar, J Tompson… - Proceedings of the …, 2021 - openaccess.thecvf.com
Self-supervised learning algorithms based on instance discrimination train encoders to be
invariant to pre-defined transformations of the same instance. While most methods treat …

Fine-grained image analysis with deep learning: A survey

XS Wei, YZ Song, O Mac Aodha, J Wu… - IEEE transactions on …, 2021 - ieeexplore.ieee.org
Fine-grained image analysis (FGIA) is a longstanding and fundamental problem in computer
vision and pattern recognition, and underpins a diverse set of real-world applications. The …