Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives

K Grauman, A Westbury, L Torresani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …

Video-mined task graphs for keystep recognition in instructional videos

K Ashutosh, SK Ramakrishnan… - Advances in Neural …, 2024 - proceedings.neurips.cc
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …

Progress-aware online action segmentation for egocentric procedural task videos

Y Shen, E Elhamifar - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …

Stepformer: Self-supervised step discovery and localization in instructional videos

N Dvornik, I Hadji, R Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Instructional videos are an important resource to learn procedural tasks from human
demonstrations. However, the instruction steps in such videos are typically short and sparse …

Pdpp: Projected diffusion for procedure planning in instructional videos

H Wang, Y Wu, S Guo, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper, we study the problem of procedure planning in instructional videos, which aims
to make goal-directed plans given the current visual observations in unstructured real-life …

Learning to ground instructional articles in videos through narrations

E Mavroudi, T Afouras… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
In this paper we present an approach for localizing steps of procedural activities in narrated
how-to videos. To deal with the scarcity of labeled data at scale, we source the step …

Detours for navigating instructional videos

K Ashutosh, Z Xue, T Nagarajan… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce the video detours problem for navigating instructional videos. Given a source
video and a natural language query asking to alter the how-to video's current path of …

ExpertAF: Expert actionable feedback from video

K Ashutosh, T Nagarajan, G Pavlakos, K Kitani… - arxiv preprint arxiv …, 2024 - arxiv.org
Feedback is essential for learning a new skill or improving one's current skill-level. However,
current methods for skill-assessment from video only provide scores or compare …

Every Problem, Every Step, All In Focus: Learning to Solve Vision-Language Problems with Integrated Attention

X Chen, J Yang, S Chen, L Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Integrating information from vision and language modalities has sparked interesting
applications in the fields of computer vision and natural language processing. Existing …

Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos

KRY Nagasinghe, H Zhou… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we explore the capability of an agent to construct a logical sequence of action
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …