Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

A review of machine learning-based human activity recognition for diverse applications

F Kulsoom, S Narejo, Z Mehmood… - Neural Computing and …, 2022 - Springer
Human activity recognition (HAR) is a very active yet challenging and demanding area of
computer science. Due to the articulated nature of human motion, it is not trivial to detect …

Self-supervised co-training for video representation learning

T Han, W **e, A Zisserman - Advances in neural information …, 2020 - proceedings.neurips.cc
The objective of this paper is visual-only self-supervised video representation learning. We
make the following contributions:(i) we investigate the benefit of adding semantic-class …

Memory-augmented dense predictive coding for video representation learning

T Han, W **e, A Zisserman - European conference on computer vision, 2020 - Springer
The objective of this paper is self-supervised learning from video, in particular for
representations for action recognition. We make the following contributions:(i) We propose a …

Speednet: Learning the speediness in videos

S Benaim, A Ephrat, O Lang, I Mosseri… - Proceedings of the …, 2020 - openaccess.thecvf.com
We wish to automatically predict the" speediness" of moving objects in videos-whether they
move faster, at, or slower than their" natural" speed. The core component in our approach is …

Distilling vision-language models on millions of videos

Y Zhao, L Zhao, X Zhou, J Wu… - Proceedings of the …, 2024 - openaccess.thecvf.com
The recent advance in vision-language models is largely attributed to the abundance of
image-text data. We aim to replicate this success for video-language models but there …

Deep learning for computer vision based activity recognition and fall detection of the elderly: a systematic review

FX Gaya-Morey, C Manresa-Yee, JM Buades-Rubio - Applied Intelligence, 2024 - Springer
As the proportion of elderly individuals in developed countries continues to rise globally,
addressing their healthcare needs, particularly in preserving their autonomy, is of paramount …

Rspnet: Relative speed perception for unsupervised video representation learning

P Chen, D Huang, D He, X Long, R Zeng… - Proceedings of the …, 2021 - ojs.aaai.org
We study unsupervised video representation learning that seeks to learn both motion and
appearance features from unlabeled video only, which can be reused for downstream tasks …

Learning object state changes in videos: An open-world perspective

Z Xue, K Ashutosh, K Grauman - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Object State Changes (OSCs) are pivotal for video understanding. While humans
can effortlessly generalize OSC understanding from familiar to unknown objects current …

Funqa: Towards surprising video comprehension

B **e, S Zhang, Z Zhou, B Li, Y Zhang, J Hessel… - … on Computer Vision, 2024 - Springer
Surprising videos, eg, funny clips, creative performances, or visual illusions, attract
significant attention. Enjoyment of these videos is not simply a response to visual stimuli; …