Temporal action segmentation: An analysis of modern techniques
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …
minutes-long videos with multiple action classes. As a long-range video understanding task …
A survey of human action recognition and posture prediction
Human action recognition and posture prediction aim to recognize and predict respectively
the action and postures of persons in videos. They are both active research topics in …
the action and postures of persons in videos. They are both active research topics in …
Merlot: Multimodal neural script knowledge models
As humans, we understand events in the visual world contextually, performing multimodal
reasoning across time to make inferences about the past, present, and future. We introduce …
reasoning across time to make inferences about the past, present, and future. We introduce …
Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
Fine-grained temporal contrastive learning for weakly-supervised temporal action localization
We target at the task of weakly-supervised action localization (WSAL), where only video-
level action labels are available during model training. Despite the recent progress, existing …
level action labels are available during model training. Despite the recent progress, existing …
Video-mined task graphs for keystep recognition in instructional videos
K Ashutosh, SK Ramakrishnan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …
task, where multiple keysteps are performed in sequence across a long video to reach a …
Few-shot video classification via temporal alignment
Difficulty in collecting and annotating large-scale video data raises a growing interest in
learning models which can recognize novel classes with only a few training examples. In …
learning models which can recognize novel classes with only a few training examples. In …
Fact: Frame-action cross-attention temporal modeling for efficient action segmentation
Z Lu, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We study supervised action segmentation whose goal is to predict framewise action labels
of a video. To capture temporal dependencies over long horizons prior works either improve …
of a video. To capture temporal dependencies over long horizons prior works either improve …
Progress-aware online action segmentation for egocentric procedural task videos
Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …
videos. While previous studies have mostly focused on offline action segmentation where …
My view is the best view: Procedure learning from egocentric videos
Procedure learning involves identifying the key-steps and determining their logical order to
perform a task. Existing approaches commonly use third-person videos for learning the …
perform a task. Existing approaches commonly use third-person videos for learning the …