A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …
achieve satisfactory performance. However, the process of collecting and labeling such data …
Self-supervised learning for videos: A survey
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
large-scale annotated datasets. However, obtaining annotations is expensive and requires …
Videomae v2: Scaling video masked autoencoders with dual masking
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …
generalize to a variety of downstream tasks. However, it is still challenging to train video …
Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training
Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …
achieve premier performance on relatively small datasets. In this paper, we show that video …
Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text
We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …
Contrastive learning for representation degeneration problem in sequential recommendation
Recent advancements of sequential deep learning models such as Transformer and BERT
have significantly facilitated the sequential recommendation. However, according to our …
have significantly facilitated the sequential recommendation. However, according to our …
TCTrack: Temporal contexts for aerial tracking
Temporal contexts among consecutive frames are far from being fully utilized in existing
visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit …
visual trackers. In this work, we present TCTrack, a comprehensive framework to fully exploit …
Hard negative mixing for contrastive learning
Contrastive learning has become a key component of self-supervised learning approaches
for computer vision. By learning to embed two augmented versions of the same image close …
for computer vision. By learning to embed two augmented versions of the same image close …
Tsmae: a novel anomaly detection approach for internet of things time series data using memory-augmented autoencoder
With the development of communication, the Internet of Things (IoT) has been widely
deployed and used in industrial manufacturing, intelligent transportation, and healthcare …
deployed and used in industrial manufacturing, intelligent transportation, and healthcare …
Spatiotemporal contrastive video representation learning
We present a self-supervised Contrastive Video Representation Learning (CVRL) method to
learn spatiotemporal visual representations from unlabeled videos. Our representations are …
learn spatiotemporal visual representations from unlabeled videos. Our representations are …