Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

F Sener, D Chatterjee, D Shelepov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …

Howto100m: Learning a text-video embedding by watching hundred million narrated video clips

A Miech, D Zhukov, JB Alayrac… - Proceedings of the …, 2019 - openaccess.thecvf.com
Learning text-video embeddings usually requires a dataset of video clips with manually
provided captions. However, such datasets are expensive and time consuming to create and …

Temporal cycle-consistency learning

D Dwibedi, Y Aytar, J Tompson… - Proceedings of the …, 2019 - openaccess.thecvf.com
We introduce a self-supervised representation learning method based on the task of
temporal alignment between videos. The method trains a network using temporal cycle …

Collaborative learning of semi-supervised segmentation and classification for medical images

Y Zhou, X He, L Huang, L Liu, F Zhu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Medical image analysis has two important research areas: disease grading and fine-grained
lesion segmentation. Although the former problem often relies on the latter, the two are …

Cross-task weakly supervised learning from instructional videos

D Zhukov, JB Alayrac, RG Cinbis… - Proceedings of the …, 2019 - openaccess.thecvf.com
In this paper we investigate learning visual models for the steps of ordinary tasks using weak
supervision via instructional narrations and an ordered list of steps instead of strong …

Temporal aggregate representations for long-range video understanding

F Sener, D Singhania, A Yao - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
Future prediction, especially in long-range videos, requires reasoning from current and past
observations. In this work, we address questions of temporal extent, scaling, and level of …

Improving action segmentation via graph-based temporal reasoning

Y Huang, Y Sugano, Y Sato - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com
Temporal relations among multiple action segments play an important role in action
segmentation especially when observations are limited (eg, actions are occluded by other …

Tl; dw? summarizing instructional videos with task relevance and cross-modal saliency

M Narasimhan, A Nagrani, C Sun, M Rubinstein… - … on Computer Vision, 2022 - Springer
YouTube users looking for instructions for a specific task may spend a long time browsing
content trying to find the right video that matches their needs. Creating a visual summary …