Grounding Video Models to Actions through Goal Conditioned Exploration
Large video models, pretrained on massive amounts of Internet video, provide a rich source
of physical knowledge about the dynamics and motions of objects and tasks. However …
of physical knowledge about the dynamics and motions of objects and tasks. However …
ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models
The ability to understand a variety of verbal instructions and perform tasks is important for
daily life support robots. People's speech to the robot may include greetings and …
daily life support robots. People's speech to the robot may include greetings and …