Grounding Video Models to Actions through Goal Conditioned Exploration

Y Luo, Y Du - arxiv preprint arxiv:2411.07223, 2024 - arxiv.org
Large video models, pretrained on massive amounts of Internet video, provide a rich source
of physical knowledge about the dynamics and motions of objects and tasks. However …

ECRAP: Exophora Resolution and Classifying User Commands for Robot Action Planning by Large Language Models

A Oyama, S Hasegawa, Y Hagiwara… - 2024 Eighth IEEE …, 2024 - ieeexplore.ieee.org
The ability to understand a variety of verbal instructions and perform tasks is important for
daily life support robots. People's speech to the robot may include greetings and …