Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Rt-2: Vision-language-action models transfer web knowledge to robotic control
A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2023 - arxiv.org
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …
directly into end-to-end robotic control to boost generalization and enable emergent …
3d-llm: Injecting the 3d world into large language models
Large language models (LLMs) and Vision-Language Models (VLMs) have been proved to
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …
excel at multiple tasks, such as commonsense reasoning. Powerful as these models can be …
Chatgpt for robotics: Design principles and model abilities
This paper presents an experimental study regarding the use of OpenAI's ChatGPT for
robotics applications. We outline a strategy that combines design principles for prompt …
robotics applications. We outline a strategy that combines design principles for prompt …
[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …
directly into end-to-end robotic control to boost generalization and enable emergent …
Your diffusion model is secretly a zero-shot classifier
The recent wave of large-scale text-to-image diffusion models has dramatically increased
our text-based image generation abilities. These models can generate realistic images for a …
our text-based image generation abilities. These models can generate realistic images for a …
Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
For robots to perform a wide variety of tasks, they require a 3D representation of the world
that is semantically rich, yet compact and efficient for task-driven perception and planning …
that is semantically rich, yet compact and efficient for task-driven perception and planning …
Visual language maps for robot navigation
Grounding language to the visual observations of a navigating agent can be performed
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …
using off-the-shelf visual-language models pretrained on Internet-scale data (eg, image …
Roboagent: Generalization and efficiency in robot manipulation via semantic augmentations and action chunking
The grand aim of having a single robot that can manipulate arbitrary objects in diverse
settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets …
settings is at odds with the paucity of robotics datasets. Acquiring and growing such datasets …
Transfer learning in robotics: An upcoming breakthrough? A review of promises and challenges
Transfer learning is a conceptually-enticing paradigm in pursuit of truly intelligent embodied
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …
agents. The core concept—reusing prior knowledge to learn in and from novel situations—is …