Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Skeletonmae: graph-based masked autoencoder for skeleton sequence pre-training

H Yan, Y Liu, Y Wei, Z Li, G Li… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Skeleton sequence representation learning has shown great advantages for action
recognition due to its promising ability to model human joints and topology. However, the …

Fine-grained temporal contrastive learning for weakly-supervised temporal action localization

J Gao, M Chen, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …

Video-mined task graphs for keystep recognition in instructional videos

K Ashutosh, SK Ramakrishnan… - Advances in Neural …, 2023 - proceedings.neurips.cc
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …

Learning to predict activity progress by self-supervised video alignment

G Donahue, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

Stepformer: Self-supervised step discovery and localization in instructional videos

N Dvornik, I Hadji, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Instructional videos are an important resource to learn procedural tasks from human
demonstrations. However, the instruction steps in such videos are typically short and sparse …

Learning fine-grained view-invariant representations from unpaired ego-exo videos via temporal alignment

ZS Xue, K Grauman - Advances in Neural Information …, 2023 - proceedings.neurips.cc
The egocentric and exocentric viewpoints of a human activity look dramatically different, yet
invariant representations to link them are essential for many potential applications in …

Drop-dtw: Aligning common signal between sequences while drop** outliers

M Dvornik, I Hadji, KG Derpanis… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this work, we consider the problem of sequence-to-sequence alignment for signals
containing outliers. Assuming the absence of outliers, the standard Dynamic Time War** …

Frame-wise action representations for long videos via sequence contrastive learning

M Chen, F Wei, C Li, D Cai - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Prior works on action representation learning mainly focus on designing various
architectures to extract the global representations for short video clips. In contrast, many …