- Academic Search

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Save Cite Cited by 123 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org

The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Save Cite Cited by 151 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Save Cite Cited by 1136 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Masked autoencoders as spatiotemporal learners

C Feichtenhofer, Y Li, K He - Advances in neural …, 2022 - proceedings.neurips.cc

This paper studies a conceptually simple extension of Masked Autoencoders (MAE) to
spatiotemporal representation learning from videos. We randomly mask out spacetime …

Save Cite Cited by 554 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com

Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Save Cite Cited by 381 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] thecvf.com

Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S **e, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training
of video models. Our approach first randomly masks out a portion of the input sequence and …

Save Cite Cited by 733 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

St-adapter: Parameter-efficient image-to-video transfer learning

J Pan, Z Lin, X Zhu, J Shao, H Li - Advances in Neural …, 2022 - proceedings.neurips.cc

Capitalizing on large pre-trained models for various downstream tasks of interest have
recently emerged with promising performance. Due to the ever-growing model size, the …

Save Cite Cited by 250 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Frozen clip models are efficient video learners

Z Lin, S Geng, R Zhang, P Gao, G De Melo… - … on Computer Vision, 2022 - Springer

Video recognition has been dominated by the end-to-end learning paradigm–first initializing
a video recognition model with weights of a pretrained image model and then conducting …

Save Cite Cited by 234 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Bevt: Bert pretraining of video transformers

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper studies the BERT pretraining of video transformers. It is a straightforward but
worth-studying extension given the recent success from BERT pretraining of image …

Save Cite Cited by 257 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Siamese masked autoencoders

A Gupta, J Wu, J Deng, FF Li - Advances in Neural …, 2023 - proceedings.neurips.cc

Establishing correspondence between images or scenes is a significant challenge in
computer vision, especially given occlusions, viewpoint changes, and varying object …

Save Cite Cited by 62 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

A large-scale study on unsupervised spatiotemporal representation learning

A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

Self-supervised learning for videos: A survey

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Masked autoencoders as spatiotemporal learners

Videomae v2: Scaling video masked autoencoders with dual masking

Masked feature prediction for self-supervised visual pre-training

St-adapter: Parameter-efficient image-to-video transfer learning

Frozen clip models are efficient video learners

Bevt: Bert pretraining of video transformers

Siamese masked autoencoders