- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

Opslaan Citeren Geciteerd door 621 Verwante artikelen Alle 2 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A comprehensive survey on test-time adaptation under distribution shifts

J Liang, R He, T Tan - International Journal of Computer Vision, 2025 - Springer

Abstract Machine learning methods strive to acquire a robust model during the training
process that can effectively generalize to test samples, even in the presence of distribution …

Opslaan Citeren Geciteerd door 185 Verwante artikelen Alle 2 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

Opslaan Citeren Geciteerd door 2290 Verwante artikelen Alle 11 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Depth anything: Unleashing the power of large-scale unlabeled data

L Yang, B Kang, Z Huang, X Xu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract This work presents Depth Anything a highly practical solution for robust monocular
depth estimation. Without pursuing novel technical modules we aim to build a simple yet …

Opslaan Citeren Geciteerd door 598 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

Opslaan Citeren Geciteerd door 307 Verwante artikelen Alle 12 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

Opslaan Citeren Geciteerd door 316 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

Opslaan Citeren Geciteerd door 232 Verwante artikelen Alle 4 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

Opslaan Citeren Geciteerd door 348 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

R Zhang, X Hu, B Li, S Huang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Visual recognition in low-data regimes requires deep neural networks to learn generalized
representations from limited training samples. Recently, CLIP-based methods have shown …

Opslaan Citeren Geciteerd door 175 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Sequential modeling enables scalable learning for large vision models

Y Bai, X Geng, K Mangalam, A Bar… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce a novel sequential modeling approach which enables learning a Large Vision
Model (LVM) without making use of any linguistic data. To do this we define a common …

Opslaan Citeren Geciteerd door 149 Verwante artikelen Alle 3 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Bootstrap your own latent-a new approach to self-supervised learning

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

A comprehensive survey on test-time adaptation under distribution shifts

Dinov2: Learning robust visual features without supervision

Depth anything: Unleashing the power of large-scale unlabeled data

Emergent correspondence from image diffusion

Scaling language-image pre-training via masking

Eyes wide shut? exploring the visual shortcomings of multimodal llms

Self-supervised learning from images with a joint-embedding predictive architecture

Prompt, generate, then cache: Cascade of foundation models makes strong few-shot learners

Sequential modeling enables scalable learning for large vision models