Socratic models: Composing zero-shot multimodal reasoning with language

A Zeng, M Attarian, B Ichter, K Choromanski… - arxiv preprint arxiv …, 2022‏ - arxiv.org
Large pretrained (eg," foundation") models exhibit distinct capabilities depending on the
domain of data they are trained on. While these domains are generic, they may only barely …

Analysis of the hands in egocentric vision: A survey

A Bandini, J Zariffa - IEEE transactions on pattern analysis and …, 2020‏ - ieeexplore.ieee.org
Egocentric vision (aka first-person vision–FPV) applications have thrived over the past few
years, thanks to the availability of affordable wearable cameras and large annotated …

Toward storytelling from visual lifelogging: An overview

M Bolanos, M Dimiccoli… - IEEE Transactions on …, 2016‏ - ieeexplore.ieee.org
Visual lifelogging consists of acquiring images that capture the daily experiences of the user
by wearing a camera over a long period of time. The pictures taken offer considerable …

3d hand shape and pose from images in the wild

A Boukhayma, R Bem, PHS Torr - Proceedings of the IEEE …, 2019‏ - openaccess.thecvf.com
We present in this work the first end-to-end deep learning based method that predicts both
3D hand shape and pose from RGB images in the wild. Our network consists of the …

Fine-grained egocentric hand-object segmentation: Dataset, model, and applications

L Zhang, S Zhou, S Stent, J Shi - European Conference on Computer …, 2022‏ - Springer
Egocentric videos offer fine-grained information for high-fidelity modeling of human
behaviors. Hands and interacting objects are one crucial aspect of understanding a viewer's …

Lending a hand: Detecting hands and recognizing activities in complex egocentric interactions

S Bambach, S Lee, DJ Crandall… - Proceedings of the IEEE …, 2015‏ - openaccess.thecvf.com
Hands appear very often in egocentric video, and their appearance and pose give important
cues about what people are doing and what they are paying attention to. But existing work in …

Egocentric audio-visual object localization

C Huang, Y Tian, A Kumar… - Proceedings of the IEEE …, 2023‏ - openaccess.thecvf.com
Humans naturally perceive surrounding scenes by unifying sound and sight in a first-person
view. Likewise, machines are advanced to approach human intelligence by learning with …

Going deeper into first-person activity recognition

M Ma, H Fan, KM Kitani - Proceedings of the IEEE Conference on …, 2016‏ - cv-foundation.org
We bring together ideas from recent work on feature design for egocentric action recognition
under one framework by exploring the use of deep convolutional neural networks (CNN) …

Survey on 3D hand gesture recognition

H Cheng, L Yang, Z Liu - … on circuits and systems for video …, 2015‏ - ieeexplore.ieee.org
Three-dimensional hand gesture recognition has attracted increasing research interests in
computer vision, pattern recognition, and human-computer interaction. The emerging depth …

Future person localization in first-person videos

T Yagi, K Mangalam, R Yonetani… - Proceedings of the …, 2018‏ - openaccess.thecvf.com
We present a new task that predicts future locations of people observed in first-person
videos. Consider a first-person video stream continuously recorded by a wearable camera …