Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
Rescaling egocentric vision: Collection, pipeline and challenges for epic-kitchens-100
This paper introduces the pipeline to extend the largest dataset in egocentric vision, EPIC-
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
KITCHENS. The effort culminates in EPIC-KITCHENS-100, a collection of 100 hours, 20M …
Epic-fusion: Audio-visual temporal binding for egocentric action recognition
We focus on multi-modal fusion for egocentric action recognition, and propose a novel
architecture for multi-modal temporal-binding, ie the combination of modalities within a …
architecture for multi-modal temporal-binding, ie the combination of modalities within a …
The epic-kitchens dataset: Collection, challenges and baselines
Since its introduction in 2018, EPIC-KITCHENS has attracted attention as the largest
egocentric video benchmark, offering a unique viewpoint on people's interaction with …
egocentric video benchmark, offering a unique viewpoint on people's interaction with …
Scaling egocentric vision: The epic-kitchens dataset
First-person vision is gaining interest as it offers a unique viewpoint on people's interaction
with objects, their attention, and even intention. However, progress in this challenging …
with objects, their attention, and even intention. However, progress in this challenging …
On semantic similarity in video retrieval
Current video retrieval efforts all found their evaluation on an instance-based assumption,
that only a single caption is relevant to a query video and vice versa. We demonstrate that …
that only a single caption is relevant to a query video and vice versa. We demonstrate that …
Diagnosing error in temporal action detectors
Despite the recent progress in video understanding and the continuous rate of improvement
in temporal action localization throughout the years, it is still unclear how far (or close?) we …
in temporal action localization throughout the years, it is still unclear how far (or close?) we …
[HTML][HTML] Egocentric vision-based action recognition: A survey
The egocentric action recognition EAR field has recently increased its popularity due to the
affordable and lightweight wearable cameras available nowadays such as GoPro and …
affordable and lightweight wearable cameras available nowadays such as GoPro and …
Basictad: an astounding rgb-only baseline for temporal action detection
Temporal action detection (TAD) is extensively studied in the video understanding
community by generally following the object detection pipeline in images. However, complex …
community by generally following the object detection pipeline in images. However, complex …
A generalized and robust framework for timestamp supervision in temporal action segmentation
In temporal action segmentation, Timestamp Supervision requires only a handful of labelled
frames per video sequence. For unlabelled frames, previous works rely on assigning hard …
frames per video sequence. For unlabelled frames, previous works rely on assigning hard …