Egotv: Egocentric task verification from natural language task descriptions
To enable progress towards egocentric agents capable of understanding everyday tasks
specified in natural language, we propose a benchmark and a synthetic dataset called …
specified in natural language, we propose a benchmark and a synthetic dataset called …
Why Not Use Your Textbook? Knowledge-Enhanced Procedure Planning of Instructional Videos
In this paper we explore the capability of an agent to construct a logical sequence of action
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …
steps thereby assembling a strategic procedural plan. This plan is crucial for navigating from …
Pretrained language models as visual planners for human assistance
In our pursuit of advancing multi-modal AI assistants capable of guiding users to achieve
complex multi-step goals, we propose the task of'Visual Planning for Assistance (VPA)' …
complex multi-step goals, we propose the task of'Visual Planning for Assistance (VPA)' …
Box2Flow: Instance-Based Action Flow Graphs from Videos
A large amount of procedural videos on the web show how to complete various tasks. These
tasks can often be accomplished in different ways and step orderings, with some steps able …
tasks can often be accomplished in different ways and step orderings, with some steps able …
EgoTV: Egocentric Task Verificationfrom Natural Language Task Descriptions
To enable progress towards egocentric agents capable of understanding everyday tasks
specified in natural language, we propose a benchmark and a synthetic dataset called …
specified in natural language, we propose a benchmark and a synthetic dataset called …