Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

Temporal action segmentation: An analysis of modern techniques

G Ding, F Sener, A Yao - IEEE Transactions on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …

Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Assembly101: A large-scale multi-view video dataset for understanding procedural activities

F Sener, D Chatterjee, D Shelepov… - Proceedings of the …, 2022 - openaccess.thecvf.com
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …

Epic-kitchens visor benchmark: Video segmentations and object relations

A Darkhalil, D Shan, B Zhu, J Ma… - Advances in …, 2022 - proceedings.neurips.cc
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for
segmenting hands and active objects in egocentric video. VISOR annotates videos from …

Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world

X Wang, T Kwon, M Rad, B Pan… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building an interactive AI assistant that can perceive, reason, and collaborate with humans
in the real world has been a long-standing pursuit in the AI community. This work is part of a …

H2o: Two hands manipulating objects for first person interaction recognition

T Kwon, B Tekin, J Stühmer, F Bogo… - Proceedings of the …, 2021 - openaccess.thecvf.com
We present a comprehensive framework for egocentric interaction recognition using
markerless 3D annotations of two hands manipulating objects. To this end, we propose a …

Egoobjects: A large-scale egocentric dataset for fine-grained object understanding

C Zhu, F **ao, A Alvarado, Y Babaei… - Proceedings of the …, 2023 - openaccess.thecvf.com
Object understanding in egocentric visual data is arguably a fundamental research topic in
egocentric vision. However, existing object datasets are either non-egocentric or have …

Error detection in egocentric procedural task videos

SP Lee, Z Lu, Z Zhang, M Hoai… - Proceedings of the …, 2024 - openaccess.thecvf.com
We present a new egocentric procedural error dataset containing videos with various types
of errors as well as normal videos and propose a new framework for procedural error …

Learning to predict activity progress by self-supervised video alignment

G Donahue, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …