Egoexo-fitness: Towards egocentric and exocentric full-body action understanding

YM Li, WJ Huang, AL Wang, LA Zeng, JK Meng… - … on Computer Vision, 2024 - Springer
Abstract We present EgoExo-Fitness, a new full-body action understanding dataset,
featuring fitness sequence videos recorded from synchronized egocentric and fixed …

A comprehensive survey of action quality assessment: Method and benchmark

K Zhou, R Cai, L Wang, HPH Shum, X Liang - arxiv preprint arxiv …, 2024 - arxiv.org
Action Quality Assessment (AQA) quantitatively evaluates the quality of human actions,
providing automated assessments that reduce biases in human judgment. Its applications …

Egovideo: Exploring egocentric foundation model and downstream adaptation

B Pei, G Chen, J Xu, Y He, Y Liu, K Pan… - arxiv preprint arxiv …, 2024 - arxiv.org
In this report, we present our solutions to the EgoVis Challenges in CVPR 2024, including
five tracks in the Ego4D challenge and three tracks in the EPIC-Kitchens challenge. Building …

Masked video and body-worn IMU autoencoder for egocentric action recognition

M Zhang, Y Huang, R Liu, Y Sato - European Conference on Computer …, 2024 - Springer
Compared with visual signals, Inertial Measurement Units (IMUs) placed on human limbs
can capture accurate motion signals while being robust to lighting variation and occlusion …

Tell, Don't Show!: Language Guidance Eases Transfer Across Domains in Images and Videos

T Kalluri, BP Majumder, M Chandraker - arxiv preprint arxiv:2403.05535, 2024 - arxiv.org
We introduce LaGTran, a novel framework that utilizes text supervision to guide robust
transfer of discriminative knowledge from labeled source to unlabeled target data with …

Unlocking exocentric video-language data for egocentric video representation learning

ZY Dou, X Yang, T Nagarajan, H Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
We present EMBED (Egocentric Models Built with Exocentric Data), a method designed to
transform exocentric video-language data for egocentric video representation learning …

Vinci: A Real-time Embodied Smart Assistant based on Egocentric Vision-Language Model

Y Huang, J Xu, B Pei, Y He, G Chen, L Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce Vinci, a real-time embodied smart assistant built upon an egocentric vision-
language model. Designed for deployment on portable devices such as smartphones and …

Cg-bench: Clue-grounded question answering benchmark for long video understanding

G Chen, Y Liu, Y Huang, Y He, B Pei, J Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Most existing video understanding benchmarks for multimodal large language models
(MLLMs) focus only on short videos. The limited number of benchmarks for long video …

Egocentric Vehicle Dense Video Captioning

F Chen, C Xu, Q Jia, Y Wang, Y Liu, H Zhang… - Proceedings of the …, 2024 - dl.acm.org
Traditional dense video captioning predominantly focuses on edited exocentric footage.
These videos are filmed from an external perspective and generally feature distinct …

Videgothink: Assessing egocentric video understanding capabilities for embodied ai

S Cheng, K Fang, Y Yu, S Zhou, B Li, Y Tian… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in Multi-modal Large Language Models (MLLMs) have opened new
avenues for applications in Embodied AI. Building on previous work, EgoThink, we introduce …