Do as i can, not as i say: Grounding language in robotic affordances

M Ahn, A Brohan, N Brown, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …

Voyager: An open-ended embodied agent with large language models

G Wang, Y **e, Y Jiang, A Mandlekar, C **ao… - arxiv preprint arxiv …, 2023 - arxiv.org
We introduce Voyager, the first LLM-powered embodied lifelong learning agent in Minecraft
that continuously explores the world, acquires diverse skills, and makes novel discoveries …

Perceiver-actor: A multi-task transformer for robotic manipulation

M Shridhar, L Manuelli, D Fox - Conference on Robot …, 2023 - proceedings.mlr.press
Transformers have revolutionized vision and natural language processing with their ability to
scale with large datasets. But in robotic manipulation, data is both limited and expensive …

Llm-planner: Few-shot grounded planning for embodied agents with large language models

CH Song, J Wu, C Washington… - Proceedings of the …, 2023 - openaccess.thecvf.com
This study focuses on using large language models (LLMs) as a planner for embodied
agents that can follow natural language instructions to complete complex tasks in a visually …

Task and motion planning with large language models for object rearrangement

Y Ding, X Zhang, C Paxton… - 2023 IEEE/RSJ …, 2023 - ieeexplore.ieee.org
Multi-object rearrangement is a crucial skill for service robots, and commonsense reasoning
is frequently needed in this process. However, achieving commonsense arrangements …

Clip-fields: Weakly supervised semantic fields for robotic memory

NMM Shafiullah, C Paxton, L Pinto, S Chintala… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose CLIP-Fields, an implicit scene model that can be used for a variety of tasks,
such as segmentation, instance identification, semantic search over space, and view …

Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling

K Nottingham, P Ammanabrolu, A Suhr… - International …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of
the world. However, if initialized with knowledge of high-level subgoals and transitions …

Esc: Exploration with soft commonsense constraints for zero-shot object navigation

K Zhou, K Zheng, C Pryor, Y Shen… - International …, 2023 - proceedings.mlr.press
The ability to accurately locate and navigate to a specific object is a crucial capability for
embodied agents that operate in the real world and interact with objects to complete tasks …

Film: Following instructions in language with modular methods

SY Min, DS Chaplot, P Ravikumar, Y Bisk… - arxiv preprint arxiv …, 2021 - arxiv.org
Recent methods for embodied instruction following are typically trained end-to-end using
imitation learning. This often requires the use of expert trajectories and low-level language …

Ok-robot: What really matters in integrating open-knowledge models for robotics

P Liu, Y Orru, J Vakil, C Paxton, NMM Shafiullah… - arxiv preprint arxiv …, 2024 - arxiv.org
Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …