The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Inner monologue: Embodied reasoning through planning with language models

W Huang, F **a, T **ao, H Chan, J Liang… - arxiv preprint arxiv …, 2022 - arxiv.org
Recent works have shown how the reasoning capabilities of Large Language Models
(LLMs) can be applied to domains beyond natural language processing, such as planning …

Do as i can, not as i say: Grounding language in robotic affordances

M Ahn, A Brohan, N Brown, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org
Large language models can encode a wealth of semantic knowledge about the world. Such
knowledge could be extremely useful to robots aiming to act upon high-level, temporally …

Language models as zero-shot planners: Extracting actionable knowledge for embodied agents

W Huang, P Abbeel, D Pathak… - … conference on machine …, 2022 - proceedings.mlr.press
Can world knowledge learned by large language models (LLMs) be used to act in
interactive environments? In this paper, we investigate the possibility of grounding high-level …

Language models meet world models: Embodied experiences enhance language models

J **ang, T Tao, Y Gu, T Shu, Z Wang… - Advances in neural …, 2024 - proceedings.neurips.cc
While large language models (LMs) have shown remarkable capabilities across numerous
tasks, they often struggle with simple reasoning and planning in physical environments …

3d concept learning and reasoning from multi-view images

Y Hong, C Lin, Y Du, Z Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Humans are able to accurately reason in 3D by gathering multi-view observations of the
surrounding world. Inspired by this insight, we introduce a new large-scale benchmark for …

Do embodied agents dream of pixelated sheep: Embodied decision making using language guided world modelling

K Nottingham, P Ammanabrolu, A Suhr… - International …, 2023 - proceedings.mlr.press
Reinforcement learning (RL) agents typically learn tabula rasa, without prior knowledge of
the world. However, if initialized with knowledge of high-level subgoals and transitions …

Grounded decoding: Guiding text generation with grounded models for robot control

W Huang, F **a, D Shah, D Driess, A Zeng, Y Lu… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent progress in large language models (LLMs) has demonstrated the ability to learn and
leverage Internet-scale knowledge through pre-training with autoregressive models …

Teach: Task-driven embodied agents that chat

A Padmakumar, J Thomason, A Shrivastava… - Proceedings of the …, 2022 - ojs.aaai.org
Robots operating in human spaces must be able to engage in natural language interaction,
both understanding and executing instructions, and using conversation to resolve ambiguity …

Sqa3d: Situated question answering in 3d scenes

X Ma, S Yong, Z Zheng, Q Li, Y Liang, SC Zhu… - arxiv preprint arxiv …, 2022 - arxiv.org
We propose a new task to benchmark scene understanding of embodied agents: Situated
Question Answering in 3D Scenes (SQA3D). Given a scene context (eg, 3D scan), SQA3D …