Real-world robot applications of foundation models: A review

K Kawaharazuka, T Matsushima… - Advanced …, 2024 - Taylor & Francis
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …

Rt-2: Vision-language-action models transfer web knowledge to robotic control

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2023 - arxiv.org
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

3d-llm: Injecting the 3d world into large language models

Y Hong, H Zhen, P Chen, S Zheng… - Advances in …, 2023 - proceedings.neurips.cc
Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …

Chatgpt for robotics: Design principles and model abilities

SH Vemprala, R Bonatti, A Bucker, A Kapoor - IEEE Access, 2024 - ieeexplore.ieee.org
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for
robotics applications. We outline a strategy that combines design principles for prompt …

[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control

B Zitkovich, T Yu, S Xu, P Xu, T **ao… - … on Robot Learning, 2023 - proceedings.mlr.press
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …

Your diffusion model is secretly a zero-shot classifier

AC Li, M Prabhudesai, S Duggal… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent wave of large-scale text-to-image diffusion models has dramatically increased
our text-based image generation abilities. These models can generate realistic images for a …

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Q Gu, A Kuwajerwala, S Morin… - … on Robotics and …, 2024 - ieeexplore.ieee.org
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …

Visual language maps for robot navigation

C Huang, O Mees, A Zeng… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
Grounding language to the visual observations of a navigating agent can be performed
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …

Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking

H Bharadhwaj, J Vakil, M Sharma… - … on Robotics and …, 2024 - ieeexplore.ieee.org
The grand aim of having a single robot that can manipulate arbitrary objects in diverse
settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets …

Transfer learning in robotics: An upcoming breakthrough? A review of promises and challenges

N Jaquier, MC Welle, A Gams, K Yao… - … Journal of Robotics …, 2023 - journals.sagepub.com
Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …