Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
A review of machine learning-based human activity recognition for diverse applications
Human activity recognition (HAR) is a very active yet challenging and demanding area of
computer science. Due to the articulated nature of human motion, it is not trivial to detect …
computer science. Due to the articulated nature of human motion, it is not trivial to detect …
Self-supervised co-training for video representation learning
The objective of this paper is visual-only self-supervised video representation learning. We
make the following contributions:(i) we investigate the benefit of adding semantic-class …
make the following contributions:(i) we investigate the benefit of adding semantic-class …
Memory-augmented dense predictive coding for video representation learning
The objective of this paper is self-supervised learning from video, in particular for
representations for action recognition. We make the following contributions:(i) We propose a …
representations for action recognition. We make the following contributions:(i) We propose a …
Speednet: Learning the speediness in videos
We wish to automatically predict the" speediness" of moving objects in videos-whether they
move faster, at, or slower than their" natural" speed. The core component in our approach is …
move faster, at, or slower than their" natural" speed. The core component in our approach is …
Distilling vision-language models on millions of videos
The recent advance in vision-language models is largely attributed to the abundance of
image-text data. We aim to replicate this success for video-language models but there …
image-text data. We aim to replicate this success for video-language models but there …
Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic review
As the proportion of elderly individuals in developed countries continues to rise globally,
addressing their healthcare needs, particularly in preserving their autonomy, is of paramount …
addressing their healthcare needs, particularly in preserving their autonomy, is of paramount …
Rspnet: Relative speed perception for unsupervised video representation learning
We study unsupervised video representation learning that seeks to learn both motion and
appearance features from unlabeled video only, which can be reused for downstream tasks …
appearance features from unlabeled video only, which can be reused for downstream tasks …
Learning object state changes in videos: An open-world perspective
Abstract Object State Changes (OSCs) are pivotal for video understanding. While humans
can effortlessly generalize OSC understanding from familiar to unknown objects current …
can effortlessly generalize OSC understanding from familiar to unknown objects current …
Funqa: Towards surprising video comprehension
Surprising videos, eg, funny clips, creative performances, or visual illusions, attract
significant attention. Enjoyment of these videos is not simply a response to visual stimuli; …
significant attention. Enjoyment of these videos is not simply a response to visual stimuli; …