Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Large language models for robotics: Opportunities, challenges, and perspectives
Large language models (LLMs) have undergone significant expansion and have been
increasingly integrated across various domains. Notably, in the realm of robot task planning …
increasingly integrated across various domains. Notably, in the realm of robot task planning …
Rt-2: Vision-language-action models transfer web knowledge to robotic control
A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2023 - arxiv.org
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …
directly into end-to-end robotic control to boost generalization and enable emergent …
[HTML][HTML] Rt-2: Vision-language-action models transfer web knowledge to robotic control
We study how vision-language models trained on Internet-scale data can be incorporated
directly into end-to-end robotic control to boost generalization and enable emergent …
directly into end-to-end robotic control to boost generalization and enable emergent …
Foundation models in robotics: Applications, challenges, and the future
We survey applications of pretrained foundation models in robotics. Traditional deep
learning models in robotics are trained on small datasets tailored for specific tasks, which …
learning models in robotics are trained on small datasets tailored for specific tasks, which …
Where are we in the search for an artificial visual cortex for embodied intelligence?
We present the largest and most comprehensive empirical study of pre-trained visual
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …
representations (PVRs) or visual 'foundation models' for Embodied AI. First, we curate …
Eureka: Human-level reward design via coding large language models
Large Language Models (LLMs) have excelled as high-level semantic planners for
sequential decision-making tasks. However, harnessing them to learn complex low-level …
sequential decision-making tasks. However, harnessing them to learn complex low-level …
Goal representations for instruction following: A semi-supervised language interface to control
Our goal is for robots to follow natural language instructions like “put the towel next to the
microwave.” But getting large amounts of labeled data, ie data that contains demonstrations …
microwave.” But getting large amounts of labeled data, ie data that contains demonstrations …
Zero-shot robotic manipulation with pretrained image-editing diffusion models
If generalist robots are to operate in truly unstructured environments, they need to be able to
recognize and reason about novel objects and scenarios. Such objects and scenarios might …
recognize and reason about novel objects and scenarios. Such objects and scenarios might …
Auto mc-reward: Automated dense reward design with large language models for minecraft
Many reinforcement learning environments (eg Minecraft) provide only sparse rewards that
indicate task completion or failure with binary values. The challenge in exploration efficiency …
indicate task completion or failure with binary values. The challenge in exploration efficiency …