Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Temporal action segmentation: An analysis of modern techniques
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in
minutes-long videos with multiple action classes. As a long-range video understanding task …
minutes-long videos with multiple action classes. As a long-range video understanding task …
Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
Assembly101: A large-scale multi-view video dataset for understanding procedural activities
Assembly101 is a new procedural activity dataset featuring 4321 videos of people
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …
assembling and disassembling 101" take-apart" toy vehicles. Participants work without fixed …
Epic-kitchens visor benchmark: Video segmentations and object relations
We introduce VISOR, a new dataset of pixel annotations and a benchmark suite for
segmenting hands and active objects in egocentric video. VISOR annotates videos from …
segmenting hands and active objects in egocentric video. VISOR annotates videos from …
Holoassist: an egocentric human interaction dataset for interactive ai assistants in the real world
Building an interactive AI assistant that can perceive, reason, and collaborate with humans
in the real world has been a long-standing pursuit in the AI community. This work is part of a …
in the real world has been a long-standing pursuit in the AI community. This work is part of a …
H2o: Two hands manipulating objects for first person interaction recognition
We present a comprehensive framework for egocentric interaction recognition using
markerless 3D annotations of two hands manipulating objects. To this end, we propose a …
markerless 3D annotations of two hands manipulating objects. To this end, we propose a …
Egoobjects: A large-scale egocentric dataset for fine-grained object understanding
Object understanding in egocentric visual data is arguably a fundamental research topic in
egocentric vision. However, existing object datasets are either non-egocentric or have …
egocentric vision. However, existing object datasets are either non-egocentric or have …
Error detection in egocentric procedural task videos
We present a new egocentric procedural error dataset containing videos with various types
of errors as well as normal videos and propose a new framework for procedural error …
of errors as well as normal videos and propose a new framework for procedural error …
Learning to predict activity progress by self-supervised video alignment
In this paper we tackle the problem of self-supervised video alignment and activity progress
prediction using in-the-wild videos. Our proposed self-supervised representation learning …
prediction using in-the-wild videos. Our proposed self-supervised representation learning …