An outlook into the future of egocentric vision

C Plizzari, G Goletto, A Furnari, S Bansal… - International Journal of …, 2024 - Springer
What will the future be? We wonder! In this survey, we explore the gap between current
research in egocentric vision and the ever-anticipated future, where wearable computing …

Rethinking clip-based video learners in cross-domain open-vocabulary action recognition

KY Lin, H Ding, J Zhou, YM Tang, YX Peng… - arxiv preprint arxiv …, 2024 - arxiv.org
Building upon the impressive success of CLIP (Contrastive Language-Image Pretraining),
recent pioneer works have proposed to adapt the powerful CLIP to video data, leading to …

Human-centric transformer for domain adaptive action recognition

KY Lin, J Zhou, WS Zheng - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
We study the domain adaptation task for action recognition, namely domain adaptive action
recognition, which aims to effectively transfer action recognition power from a label-sufficient …

AFF-ttention! Affordances and Attention Models for Short-Term Object Interaction Anticipation

L Mur-Labadia, R Martinez-Cantin, JJ Guerrero… - … on Computer Vision, 2024 - Springer
Abstract Short-Term object-interaction Anticipation (STA) consists of detecting the location of
the next-active objects, the noun and verb categories of the interaction, and the time to …

Multimodal cross-domain few-shot learning for egocentric action recognition

M Hatano, R Hachiuma, R Fujii, H Saito - European Conference on …, 2024 - Springer
We address a novel cross-domain few-shot learning task (CD-FSL) with multimodal input
and unlabeled target data for egocentric action recognition. This paper simultaneously …

Are Synthetic Data Useful for Egocentric Hand-Object Interaction Detection?

R Leonardi, A Furnari, F Ragusa… - European Conference on …, 2024 - Springer
In this study, we investigate the effectiveness of synthetic data in enhancing egocentric hand-
object interaction detection. Via extensive experiments and comparative analyses on three …

Egonce++: Do egocentric video-language models really understand hand-object interactions?

B Xu, Z Wang, Y Du, Z Song, S Zheng, Q ** - arxiv preprint arxiv …, 2024 - arxiv.org
Egocentric video-language pretraining is a crucial paradigm to advance the learning of
egocentric hand-object interactions (EgoHOI). Despite the great success on existing …

A Backpack Full of Skills: Egocentric Video Understanding with Diverse Task Perspectives

SA Peirone, F Pistilli, A Alliegro… - Proceedings of the …, 2024 - openaccess.thecvf.com
Human comprehension of a video stream is naturally broad: in a few instants we are able to
understand what is happening the relevance and relationship of objects and forecast what …

A survey on deep learning techniques for action anticipation

Z Zhong, M Martin, M Voit, J Gall, J Beyerer - arxiv preprint arxiv …, 2023 - arxiv.org
The ability to anticipate possible future human actions is essential for a wide range of
applications, including autonomous driving and human-robot interaction. Consequently …

What does CLIP know about peeling a banana?

C Cuttano, G Rosi, G Trivigno… - Proceedings of the …, 2024 - openaccess.thecvf.com
Humans show an innate capability to identify tools to support specific actions. The
association between objects parts and the actions they facilitate is usually named …