Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Aligning cyber space with physical world: A comprehensive survey on embodied ai
Embodied Artificial Intelligence (Embodied AI) is crucial for achieving Artificial General
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Intelligence (AGI) and serves as a foundation for various applications that bridge cyberspace …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Openeqa: Embodied question answering in the era of foundation models
We present a modern formulation of Embodied Question Answering (EQA) as the task of
understanding an environment well enough to answer questions about it in natural …
understanding an environment well enough to answer questions about it in natural …
Ok-robot: What really matters in integrating open-knowledge models for robotics
Remarkable progress has been made in recent years in the fields of vision, language, and
robotics. We now have vision models capable of recognizing objects based on language …
robotics. We now have vision models capable of recognizing objects based on language …
Affordancellm: Grounding affordance from vision language models
Affordance grounding refers to the task of finding the area of an object with which one can
interact. It is a fundamental but challenging task as a successful solution requires the …
interact. It is a fundamental but challenging task as a successful solution requires the …
Large language models as generalizable policies for embodied tasks
We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …
Prompt a robot to walk with large language models
Large language models (LLMs) pre-trained on vast internet-scale data have showcased
remarkable capabilities across diverse domains. Recently, there has been escalating …
remarkable capabilities across diverse domains. Recently, there has been escalating …
V-IRL: Grounding Virtual Intelligence in Real Life
There is a sensory gulf between the Earth that humans inhabit and the digital realms in
which modern AI agents are created. To develop AI agents that can sense, think, and act as …
which modern AI agents are created. To develop AI agents that can sense, think, and act as …
Habitat synthetic scenes dataset (hssd-200): An analysis of 3d scene scale and realism tradeoffs for objectgoal navigation
Abstract We contribute the Habitat Synthetic Scene Dataset a dataset of 211 high-quality 3D
scenes and use it to test navigation agent generalization to realistic 3D environments. Our …
scenes and use it to test navigation agent generalization to realistic 3D environments. Our …