Socratic models: Composing zero-shot multimodal reasoning with language
Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …
domain of data they are trained on. While these domains are generic, they may only barely …
Analysis of the hands in egocentric vision: A survey
Egocentric vision (aka first-person vision–FPV) applications have thrived over the past few
years, thanks to the availability of affordable wearable cameras and large annotated …
years, thanks to the availability of affordable wearable cameras and large annotated …
Ego4d: Around the world in 3,000 hours of egocentric video
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …
A review on video summarization techniques
The exponential growth of technology has resulted in a profusion of advanced imaging
devices and eases internet accessibility, leading to an increase in the creation and use of …
devices and eases internet accessibility, leading to an increase in the creation and use of …
Video summarization using deep neural networks: A survey
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …
selecting the most informative parts of the video content. Several approaches have been …
Naq: Leveraging narrations as queries to supervise episodic memory
Searching long egocentric videos with natural language queries (NLQ) has compelling
applications in augmented reality and robotics, where a fluid index into everything that a …
applications in augmented reality and robotics, where a fluid index into everything that a …
Video summarization using deep learning techniques: a detailed analysis and investigation
One of the critical multimedia analysis problems in today's digital world is video
summarization (VS). Many VS methods have been suggested based on deep learning …
summarization (VS). Many VS methods have been suggested based on deep learning …
An outlook into the future of egocentric vision
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …
research in egocentric vision and the ever-anticipated future, where wearable computing …
E2 (go) motion: Motion augmented event stream for egocentric action recognition
Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level
intensity changes in the form of" events". Due to their sensing mechanism, event cameras …
intensity changes in the form of" events". Due to their sensing mechanism, event cameras …
Ego-exo: Transferring visual representations from third-person to first-person videos
We introduce an approach for pre-training egocentric video models using large-scale third-
person video datasets. Learning from purely egocentric data is limited by low dataset scale …
person video datasets. Learning from purely egocentric data is limited by low dataset scale …