Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

A survey of human action recognition and posture prediction

N Ma, Z Wu, Y Cheung, Y Guo, Y Gao… - Tsinghua Science …, 2022 - ieeexplore.ieee.org
Human action recognition and posture prediction aim to recognize and predict respectively
the action and postures of persons in videos. They are both active research topics in …

Merlot: Multimodal neural script knowledge models

R Zellers, X Lu, J Hessel, Y Yu… - Advances in neural …, 2021 - proceedings.neurips.cc
As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …

Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100

D Damen, H Doughty, GM Farinella, A Furnari… - International Journal of …, 2022 - Springer
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …

Fine-grained temporal contrastive learning for weakly-supervised temporal action localization

J Gao, M Chen, C Xu - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …

Video-mined task graphs for keystep recognition in instructional videos

K Ashutosh, SK Ramakrishnan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …

Few-shot video classification via temporal alignment

K Cao, J Ji, Z Cao, CY Chang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Difficulty in collecting and annotating large-scale video data raises a growing interest in
learning models which can recognize novel classes with only a few training examples. In …

Fact: Frame-action cross-attention temporal modeling for efficient action segmentation

Z Lu, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We study supervised action segmentation whose goal is to predict framewise action labels
of a video. To capture temporal dependencies over long horizons prior works either improve …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

My view is the best view: Procedure learning from egocentric videos

S Bansal, C Arora, CV Jawahar - European Conference on Computer …, 2022 - Springer
Procedure learning involves identifying the key-steps and determining their logical order to
perform a task. Existing approaches commonly use third-person videos for learning the …