Socratic models: Composing zero-shot multimodal reasoning with language

A Zeng, M Attarian, B Ichter, K Choromanski… - arxiv preprint arxiv …, 2022 - arxiv.org
Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …

Analysis of the hands in egocentric vision: A survey

A Bandini, J Zariffa - IEEE transactions on pattern analysis and …, 2020 - ieeexplore.ieee.org
Egocentric vision (aka first-person vision–FPV) applications have thrived over the past few
years, thanks to the availability of affordable wearable cameras and large annotated …

Ego4d: Around the world in 3,000 hours of egocentric video

K Grauman, A Westbury, E Byrne… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite. It
offers 3,670 hours of daily-life activity video spanning hundreds of scenarios (household …

A review on video summarization techniques

P Meena, H Kumar, SK Yadav - Engineering Applications of Artificial …, 2023 - Elsevier
The exponential growth of technology has resulted in a profusion of advanced imaging
devices and eases internet accessibility, leading to an increase in the creation and use of …

Video summarization using deep neural networks: A survey

E Apostolidis, E Adamantidou, AI Metsai… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Video summarization technologies aim to create a concise and complete synopsis by
selecting the most informative parts of the video content. Several approaches have been …

Naq: Leveraging narrations as queries to supervise episodic memory

SK Ramakrishnan, Z Al-Halah… - Proceedings of the …, 2023 - openaccess.thecvf.com
Searching long egocentric videos with natural language queries (NLQ) has compelling
applications in augmented reality and robotics, where a fluid index into everything that a …

Video summarization using deep learning techniques: a detailed analysis and investigation

P Saini, K Kumar, S Kashid, A Saini, A Negi - Artificial Intelligence Review, 2023 - Springer
One of the critical multimedia analysis problems in today's digital world is video
summarization (VS). Many VS methods have been suggested based on deep learning …

An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

E2 (go) motion: Motion augmented event stream for egocentric action recognition

C Plizzari, M Planamente, G Goletto… - Proceedings of the …, 2022 - openaccess.thecvf.com
Event cameras are novel bio-inspired sensors, which asynchronously capture pixel-level
intensity changes in the form of" events". Due to their sensing mechanism, event cameras …

Ego-exo: Transferring visual representations from third-person to first-person videos

Y Li, T Nagarajan, B **ong… - Proceedings of the …, 2021 - openaccess.thecvf.com
We introduce an approach for pre-training egocentric video models using large-scale third-
person video datasets. Learning from purely egocentric data is limited by low dataset scale …