A survey on integration of large language models with intelligent robots

Y Kim, D Kim, J Choi, J Park, N Oh, D Park - Intelligent Service Robotics, 2024 - Springer
In recent years, the integration of large language models (LLMs) has revolutionized the field
of robotics, enabling robots to communicate, understand, and reason with human-like …

Foundation models in robotics: Applications, challenges, and the future

R Firoozi, J Tucker, S Tian… - … Journal of Robotics …, 2023 - journals.sagepub.com
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …

Toward general-purpose robots via foundation models: A survey and meta-analysis

Y Hu, Q **e, V Jain, J Francis, J Patrikar… - arxiv preprint arxiv …, 2023 - arxiv.org
Building general-purpose robots that operate seamlessly in any environment, with any
object, and utilizing various skills to complete diverse tasks has been a long-standing goal in …

ivideogpt: Interactive videogpts are scalable world models

J Wu, S Yin, N Feng, X He, D Li… - Advances in Neural …, 2025 - proceedings.neurips.cc
World models empower model-based agents to interactively explore, reason, and plan
within imagined environments for real-world decision-making. However, the high demand …

Rekep: Spatio-temporal reasoning of relational keypoint constraints for robotic manipulation

W Huang, C Wang, Y Li, R Zhang, L Fei-Fei - arxiv preprint arxiv …, 2024 - arxiv.org
Representing robotic manipulation tasks as constraints that associate the robot and the
environment is a promising way to encode desired robot behaviors. However, it remains …

Worldcoder, a model-based llm agent: Building world models by writing code and interacting with the environment

H Tang, D Key, K Ellis - Advances in Neural Information …, 2025 - proceedings.neurips.cc
We give a model-based agent that builds a Python program representing its knowledge of
the world based on its interactions with the environment. The world model tries to explain its …

Towards generalist robot learning from internet video: A survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arxiv preprint arxiv …, 2024 - arxiv.org
Scaling deep learning to massive, diverse internet data has yielded remarkably general
capabilities in visual and natural language understanding and generation. However, data …

Moka: Open-vocabulary robotic manipulation through mark-based visual prompting

F Liu, K Fang, P Abbeel, S Levine - First Workshop on Vision …, 2024 - openreview.net
Open-vocabulary generalization requires robotic systems to perform tasks involving complex
and diverse environments and task goals. While the recent advances in vision language …

Copa: General robotic manipulation through spatial constraints of parts with foundation models

H Huang, F Lin, Y Hu, S Wang… - 2024 IEEE/RSJ …, 2024 - ieeexplore.ieee.org
Foundation models pre-trained on web-scale data are shown to encapsulate extensive
world knowledge beneficial for robotic manipulation in the form of task planning. However …

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

CL Cheang, G Chen, Y **g, T Kong, H Li, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable
robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture …