A survey of embodied ai: From simulators to research tasks
There has been an emerging paradigm shift from the era of “internet AI” to “embodied AI,”
where AI algorithms and agents no longer learn from datasets of images, videos or text …
where AI algorithms and agents no longer learn from datasets of images, videos or text …
Rt-1: Robotics transformer for real-world control at scale
A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …
learning models can solve specific downstream tasks either zero-shot or with small task …
Q-transformer: Scalable offline reinforcement learning via autoregressive q-functions
In this work, we present a scalable reinforcement learning method for training multi-task
policies from large offline datasets that can leverage both human demonstrations and …
policies from large offline datasets that can leverage both human demonstrations and …
Navigating to objects in the real world
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …
such as homes or hospitals. Many learning-based approaches have been proposed in …
Moka: Open-vocabulary robotic manipulation through mark-based visual prompting
Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …
and diverse environments and task goals. While the recent advances in vision language …
History aware multimodal transformer for vision-and-language navigation
Vision-and-language navigation (VLN) aims to build autonomous visual agents that follow
instructions and navigate in real scenes. To remember previously visited locations and …
instructions and navigate in real scenes. To remember previously visited locations and …
Open-vocabulary queryable scene representations for real world planning
Large language models (LLMs) have unlocked new capabilities of task planning from
human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are …
human instructions. However, prior attempts to apply LLMs to real-world robotic tasks are …
Think global, act local: Dual-scale graph transformer for vision-and-language navigation
Following language instructions to navigate in unseen environments is a challenging
problem for autonomous embodied agents. The agent not only needs to ground languages …
problem for autonomous embodied agents. The agent not only needs to ground languages …
Spatio-temporal graph transformer networks for pedestrian trajectory prediction
Understanding crowd motion dynamics is critical to real-world applications, eg, surveillance
systems and autonomous driving. This is challenging because it requires effectively …
systems and autonomous driving. This is challenging because it requires effectively …
Poni: Potential functions for objectgoal navigation with interaction-free learning
State-of-the-art approaches to ObjectGoal navigation (ObjectNav) rely on reinforcement
learning and typically require significant computational resources and time for learning. We …
learning and typically require significant computational resources and time for learning. We …