Ego-exo4d: Understanding skilled human activity from first-and third-person perspectives
Abstract We present Ego-Exo4D a diverse large-scale multimodal multiview video dataset
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
and benchmark challenge. Ego-Exo4D centers around simultaneously-captured egocentric …
Video-mined task graphs for keystep recognition in instructional videos
Procedural activity understanding requires perceiving human actions in terms of a broader
task, where multiple keysteps are performed in sequence across a long video to reach a …
task, where multiple keysteps are performed in sequence across a long video to reach a …
Progress-aware online action segmentation for egocentric procedural task videos
We address the problem of online action segmentation for egocentric procedural task
videos. While previous studies have mostly focused on offline action segmentation where …
videos. While previous studies have mostly focused on offline action segmentation where …
Stepformer: Self-supervised step discovery and localization in instructional videos
Instructional videos are an important resource to learn procedural tasks from human
demonstrations. However, the instruction steps in such videos are typically short and sparse …
demonstrations. However, the instruction steps in such videos are typically short and sparse …
Pdpp: Projected diffusion for procedure planning in instructional videos
In this paper, we study the problem of procedure planning in instructional videos, which aims
to make goal-directed plans given the current visual observations in unstructured real-life …
to make goal-directed plans given the current visual observations in unstructured real-life …
Learning to ground instructional articles in videos through narrations
In this paper we present an approach for localizing steps of procedural activities in narrated
how-to videos. To deal with the scarcity of labeled data at scale, we source the step …
how-to videos. To deal with the scarcity of labeled data at scale, we source the step …
Detours for navigating instructional videos
We introduce the video detours problem for navigating instructional videos. Given a source
video and a natural language query asking to alter the how-to video's current path of …
video and a natural language query asking to alter the how-to video's current path of …
ExpertAF: Expert actionable feedback from video
Feedback is essential for learning a new skill or improving one's current skill-level. However,
current methods for skill-assessment from video only provide scores or compare …
current methods for skill-assessment from video only provide scores or compare …
Every Problem, Every Step, All In Focus: Learning to Solve Vision-Language Problems with Integrated Attention
Integrating information from vision and language modalities has sparked interesting
applications in the fields of computer vision and natural language processing. Existing …
applications in the fields of computer vision and natural language processing. Existing …
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
In this paper we explore the capability of an agent to construct a logical sequence of action
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …