The rise and potential of large language model based agents: A survey

Z **, W Chen, X Guo, W He, Y Ding, B Hong… - Science China …, 2025 - Springer
For a long time, researchers have sought artificial intelligence (AI) that matches or exceeds
human intelligence. AI agents, which are artificial entities capable of sensing the …

Foundation Models Defining a New Era in Vision: a Survey and Outlook

M Awais, M Naseer, S Khan, RM Anwer… - … on Pattern Analysis …, 2025 - ieeexplore.ieee.org
Vision systems that see and reason about the compositional nature of visual scenes are
fundamental to understanding our world. The complex relations between objects and their …

Lm-nav: Robotic navigation with large pre-trained models of language, vision, and action

D Shah, B Osiński, S Levine - Conference on robot …, 2023 - proceedings.mlr.press
Goal-conditioned policies for robotic navigation can be trained on large, unannotated
datasets, providing for good generalization to real-world settings. However, particularly in …

Rvt: Robotic view transformer for 3d object manipulation

A Goyal, J Xu, Y Guo, V Blukis… - Conference on Robot …, 2023 - proceedings.mlr.press
For 3D object manipulation, methods that build an explicit 3D representation perform better
than those relying only on camera images. But using explicit 3D representations like voxels …

Navigating to objects in the real world

T Gervet, S Chintala, D Batra, J Malik, DS Chaplot - Science Robotics, 2023 - science.org
Semantic navigation is necessary to deploy mobile robots in uncontrolled environments
such as homes or hospitals. Many learning-based approaches have been proposed in …

Nomad: Goal masked diffusion policies for navigation and exploration

A Sridhar, D Shah, C Glossop… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Robotic learning for navigation in unfamiliar environments needs to provide policies for both
task-oriented navigation (ie, reaching a goal that the robot has located), and task-agnostic …

ViNT: A foundation model for visual navigation

D Shah, A Sridhar, N Dashora, K Stachowicz… - arxiv preprint arxiv …, 2023 - arxiv.org
General-purpose pre-trained models (" foundation models") have enabled practitioners to
produce generalizable solutions for individual machine learning problems with datasets that …

Large language models for robotics: A survey

F Zeng, W Gan, Y Wang, N Liu, PS Yu - arxiv preprint arxiv:2311.07226, 2023 - arxiv.org
The human ability to learn, generalize, and control complex manipulation tasks through multi-
modality feedback suggests a unique capability, which we refer to as dexterity intelligence …

Gnm: A general navigation model to drive any robot

D Shah, A Sridhar, A Bhorkar, N Hirose… - … on Robotics and …, 2023 - ieeexplore.ieee.org
Learning provides a powerful tool for vision-based navigation, but the capabilities of
learning-based policies are constrained by limited training data. If we could combine data …

Navigating to objects specified by images

J Krantz, T Gervet, K Yadav, A Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Images are a convenient way to specify which particular object instance an embodied agent
should navigate to. Solving this task requires semantic visual reasoning and exploration of …