An outlook into the future of egocentric vision
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …
research in egocentric vision and the ever-anticipated future, where wearable computing …
Rethinking clip-based video learners in cross-domain open-vocabulary action recognition
Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining),
recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to …
recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to …
Human-centric transformer for domain adaptive action recognition
We study the domain adaptation task for action recognition, namely domain adaptive action
recognition, which aims to effectively transfer action recognition power from a label-sufficient …
recognition, which aims to effectively transfer action recognition power from a label-sufficient …
AFF-ttention! Affordances and Attention Models for Short-Term Object Interaction Anticipation
Abstract Short-Term object-interaction Anticipation (STA) consists of detecting the location of
the next-active objects, the noun and verb categories of the interaction, and the time to …
the next-active objects, the noun and verb categories of the interaction, and the time to …
Multimodal cross-domain few-shot learning for egocentric action recognition
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …
and unlabeled target data for egocentric action recognition. This paper simultaneously …
Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?
In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-
object interaction detection. Via extensive experiments and comparative analyses on three …
object interaction detection. Via extensive experiments and comparative analyses on three …
Egonce++: Do egocentric video-language models really understand hand-object interactions?
Egocentric video-language pretraining is a crucial paradigm to advance the learning of
egocentric hand-object interactions (EgoHOI). Despite the great success on existing …
egocentric hand-object interactions (EgoHOI). Despite the great success on existing …
A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives
Human comprehension of a video stream is naturally broad: in a few instants we are able to
understand what is happening the relevance and relationship of objects and forecast what …
understand what is happening the relevance and relationship of objects and forecast what …
A survey on deep learning techniques for action anticipation
The ability to anticipate possible future human actions is essential for a wide range of
applications, including autonomous driving and human-robot interaction. Consequently …
applications, including autonomous driving and human-robot interaction. Consequently …
What does CLIP know about peeling a banana?
Humans show an innate capability to identify tools to support specific actions. The
association between objects parts and the actions they facilitate is usually named …
association between objects parts and the actions they facilitate is usually named …