Videomae v2: Scaling video masked autoencoders with dual masking
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …
generalize to a variety of downstream tasks. However, it is still challenging to train video …
Mgmae: Motion guided masking for video masked autoencoding
Masked autoencoding has shown excellent performance on self-supervised video
representation learning. Temporal redundancy has led to a high masking ratio and …
representation learning. Temporal redundancy has led to a high masking ratio and …
Learning to predict activity progress by self-supervised video alignment
In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …
prediction using in-the-wild videos. Our proposed self-supervised representation learning …
Masked modeling for self-supervised representation learning on vision and beyond
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …
increasing attention in recent years thanks to its remarkable representation learning ability …
From static to dynamic: Adapting landmark-aware image models for facial expression recognition in videos
Dynamic facial expression recognition (DFER) in the wild is still hindered by data limitations,
eg, insufficient quantity and diversity of pose, occlusion and illumination, as well as the …
eg, insufficient quantity and diversity of pose, occlusion and illumination, as well as the …
Masked motion encoding for self-supervised video representation learning
How to learn discriminative video representation from unlabeled videos is challenging but
crucial for video analysis. The latest attempts seek to learn a representation model by …
crucial for video analysis. The latest attempts seek to learn a representation model by …
Masked autoencoders in computer vision: A comprehensive survey
Z Zhou, X Liu - IEEE Access, 2023 - ieeexplore.ieee.org
Masked autoencoders (MAE) is a deep learning method based on Transformer. Originally
used for images, it has now been extended to video, audio, and some other temporal …
used for images, it has now been extended to video, audio, and some other temporal …
Darkness-adaptive action recognition: Leveraging efficient tubelet slow-fast network for industrial applications
Infrared (IR) technology has emerged as a solution for monitoring dark environments. It
offers resilience to shifting illumination, appearance changes, and shadows, with …
offers resilience to shifting illumination, appearance changes, and shadows, with …
Contextual visual and motion salient fusion framework for action recognition in dark environments
Infrared (IR) human action recognition (AR) exhibits resilience against shifting illumination
conditions, changes in appearance, and shadows. It has valuable applications in numerous …
conditions, changes in appearance, and shadows. It has valuable applications in numerous …
Ams-net: Modeling adaptive multi-granularity spatio-temporal cues for video action recognition
Effective spatio-temporal modeling as a core of video representation learning is challenged
by complex scale variations in spatio-temporal cues in videos, especially different visual …
by complex scale variations in spatio-temporal cues in videos, especially different visual …