Cognitive architectures for language agents

T Sumers, S Yao, K Narasimhan… - Transactions on Machine …, 2023 - openreview.net
Recent efforts have augmented large language models (LLMs) with external resources (eg,
the Internet) or internal control flows (eg, prompt chaining) for tasks requiring grounding or …

Robots that ask for help: Uncertainty alignment for large language model planners

AZ Ren, A Dixit, A Bodrova, S Singh, S Tu… - arxiv preprint arxiv …, 2023 - arxiv.org
Large language models (LLMs) exhibit a wide range of promising capabilities--from step-by-
step planning to commonsense reasoning--that may provide utility for robots, but remain …

Building cooperative embodied agents modularly with large language models

H Zhang, W Du, J Shan, Q Zhou, Y Du… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we address challenging multi-agent cooperation problems with decentralized
control, raw sensory observations, costly communication, and multi-objective tasks …

3d-vista: Pre-trained transformer for 3d vision and text alignment

Z Zhu, X Ma, Y Chen, Z Deng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract 3D vision-language grounding (3D-VL) is an emerging field that aims to connect the
3D physical world with natural language, which is crucial for achieving embodied …

Robot learning in the era of foundation models: A survey

X **ao, J Liu, Z Wang, Y Zhou, Y Qi, Q Cheng… - arxiv preprint arxiv …, 2023 - arxiv.org
The proliferation of Large Language Models (LLMs) has s fueled a shift in robot learning
from automation towards general embodied Artificial Intelligence (AI). Adopting foundation …

Embodied navigation with multi-modal information: A survey from tasks to methodology

Y Wu, P Zhang, M Gu, J Zheng, X Bai - Information Fusion, 2024 - Elsevier
Embodied AI aims to create agents that complete complex tasks by interacting with the
environment. A key problem in this field is embodied navigation which understands multi …

Scaling data generation in vision-and-language navigation

Z Wang, J Li, Y Hong, Y Wang, Q Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent research in language-guided visual navigation has demonstrated a significant
demand for the diversity of traversable environments and the quantity of supervision for …

Habitat 3.0: A co-habitat for humans, avatars and robots

X Puig, E Undersander, A Szot, MD Cote… - arxiv preprint arxiv …, 2023 - arxiv.org
We present Habitat 3.0: a simulation platform for studying collaborative human-robot tasks in
home environments. Habitat 3.0 offers contributions across three dimensions:(1) Accurate …

Mindagent: Emergent gaming interaction

R Gong, Q Huang, X Ma, H Vo, Z Durante… - arxiv preprint arxiv …, 2023 - arxiv.org
Large Language Models (LLMs) have the capacity of performing complex scheduling in a
multi-agent system and can coordinate these agents into completing sophisticated tasks that …

Homerobot: Open-vocabulary mobile manipulation

S Yenamandra, A Ramachandran, K Yadav… - arxiv preprint arxiv …, 2023 - arxiv.org
HomeRobot (noun): An affordable compliant robot that navigates homes and manipulates a
wide range of objects in order to complete everyday tasks. Open-Vocabulary Mobile …