Real-world robot applications of foundation models: A review
Recent developments in foundation models, like Large Language Models (LLMs) and Vision-
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Language Models (VLMs), trained on extensive data, facilitate flexible application across …
Shapellm: Universal 3d object understanding for embodied interaction
This paper presents ShapeLLM, the first 3D Multimodal Large Language Model (LLM)
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
designed for embodied interaction, exploring a universal 3D object understanding with 3D …
SceneVerse: Scaling 3D Vision-Language Learning for Grounded Scene Understanding
Abstract 3D vision-language (3D-VL) grounding, which aims to align language with 3D
physical environments, stands as a cornerstone in develo** embodied agents. In …
physical environments, stands as a cornerstone in develo** embodied agents. In …
Gapartnet: Cross-category domain-generalizable object perception and manipulation via generalizable and actionable parts
For years, researchers have been devoted to generalizable object perception and
manipulation, where cross-category generalizability is highly desired yet underexplored. In …
manipulation, where cross-category generalizability is highly desired yet underexplored. In …
Scaling proprioceptive-visual learning with heterogeneous pre-trained transformers
One of the roadblocks for training generalist robotic models today is heterogeneity. Previous
robot learning methods often collect data to train with one specific embodiment for one task …
robot learning methods often collect data to train with one specific embodiment for one task …
Manipllm: Embodied multimodal large language model for object-centric robotic manipulation
Robot manipulation relies on accurately predicting contact points and end-effector directions
to ensure successful operation. However learning-based robot manipulation trained on a …
to ensure successful operation. However learning-based robot manipulation trained on a …
Large language models as generalizable policies for embodied tasks
We show that large language models (LLMs) can be adapted to be generalizable policies
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …
for embodied visual tasks. Our approach, called Large LAnguage model Reinforcement …
An embodied generalist agent in 3d world
Leveraging massive knowledge and learning schemes from large language models (LLMs),
recent machine learning models show notable successes in building generalist agents that …
recent machine learning models show notable successes in building generalist agents that …
Ram: Retrieval-based affordance transfer for generalizable zero-shot robotic manipulation
This work proposes a retrieve-and-transfer framework for zero-shot robotic manipulation,
dubbed RAM, featuring generalizability across various objects, environments, and …
dubbed RAM, featuring generalizability across various objects, environments, and …
Physcene: Physically interactable 3d scene synthesis for embodied ai
With recent developments in Embodied Artificial Intelligence (EAI) research there has been
a growing demand for high-quality large-scale interactive scene generation. While prior …
a growing demand for high-quality large-scale interactive scene generation. While prior …