- Academic Search

C Zhou, Q Li, C Li, J Yu, Y Liu, G Wang… - International Journal of …, 2024 - Springer

Abstract Pretrained Foundation Models (PFMs) are regarded as the foundation for various
downstream tasks across different data modalities. A PFM (eg, BERT, ChatGPT, GPT-4) is …

保存引用被引用数: 611 関連記事全 2 バージョン

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

保存引用被引用数: 128 関連記事全 3 バージョン

[Free GPT-4]

[PDF] arxiv.org

Dinov2: Learning robust visual features without supervision

M Oquab, T Darcet, T Moutakanni, H Vo… - arxiv preprint arxiv …, 2023 - arxiv.org

The recent breakthroughs in natural language processing for model pretraining on large
quantities of data have opened the way for similar foundation models in computer vision …

保存引用被引用数: 2232 関連記事全 11 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Imagebind: One embedding space to bind them all

R Girdhar, A El-Nouby, Z Liu, M Singh… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present ImageBind, an approach to learn a joint embedding across six different
modalities-images, text, audio, depth, thermal, and IMU data. We show that all combinations …

保存引用被引用数: 840 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Image as a foreign language: Beit pretraining for vision and vision-language tasks

W Wang, H Bao, L Dong, J Bjorck… - Proceedings of the …, 2023 - openaccess.thecvf.com

A big convergence of language, vision, and multimodal pretraining is emerging. In this work,
we introduce a general-purpose multimodal foundation model BEiT-3, which achieves …

保存引用被引用数: 452 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

保存引用被引用数: 382 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Emergent correspondence from image diffusion

L Tang, M Jia, Q Wang, CP Phoo… - Advances in Neural …, 2023 - proceedings.neurips.cc

Finding correspondences between images is a fundamental problem in computer vision. In
this paper, we show that correspondence emerges in image diffusion models without any …

保存引用被引用数: 300 関連記事全 12 バージョン HTMLバージョン

[Free GPT-4]

[PDF] thecvf.com

Self-supervised learning from images with a joint-embedding predictive architecture

M Assran, Q Duval, I Misra… - Proceedings of the …, 2023 - openaccess.thecvf.com

This paper demonstrates an approach for learning highly semantic image representations
without relying on hand-crafted data-augmentations. We introduce the Image-based Joint …

保存引用被引用数: 339 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com

We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

保存引用被引用数: 126 関連記事全 2 バージョン

[Free GPT-4]

[PDF] thecvf.com

Cut and learn for unsupervised object detection and instance segmentation

X Wang, R Girdhar, SX Yu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract We propose Cut-and-LEaRn (CutLER), a simple approach for training
unsupervised object detection and segmentation models. We leverage the property of self …

保存引用被引用数: 179 関連記事全 5 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Unsupervised feature learning via non-parametric instance discrimination

A comprehensive survey on pretrained foundation models: A history from bert to chatgpt

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

Dinov2: Learning robust visual features without supervision

Imagebind: One embedding space to bind them all

Image as a foreign language: Beit pretraining for vision and vision-language tasks

Videomae v2: Scaling video masked autoencoders with dual masking

Emergent correspondence from image diffusion

Self-supervised learning from images with a joint-embedding predictive architecture

Foundation models in robotics: Applications, challenges, and the future

Cut and learn for unsupervised object detection and instance segmentation