[HTML][HTML] A survey of robot intelligence with large language models
Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …
Video language planning
We are interested in enabling visual planning for complex long-horizon tasks in the space of
generated videos and language, leveraging recent advances in large generative models …
generated videos and language, leveraging recent advances in large generative models …
Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …
Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement
We propose a system for rearranging objects in a scene to achieve a desired object-scene
placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline …
placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline …
Can pre-trained text-to-image models generate visual goals for reinforcement learning?
Pre-trained text-to-image generative models can produce diverse, semantically rich, and
realistic images from natural language descriptions. Compared with language, images …
realistic images from natural language descriptions. Compared with language, images …
Sg-bot: Object rearrangement via coarse-to-fine robotic imagination on scene graphs
Object rearrangement is pivotal in robotic-environment interactions, representing a
significant capability in embodied AI. In this paper, we present SG-Bot, a novel …
significant capability in embodied AI. In this paper, we present SG-Bot, a novel …
Act3d: Infinite resolution action detection transformer for robotic manipulation
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
Act3d: 3d feature field transformers for multi-task robotic manipulation
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …
Deep generative models in robotics: A survey on learning from multimodal demonstrations
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …
data, is gaining popularity with the emergence of deep generative models. Although the …
Gen2sim: Scaling up robot learning in simulation with generative models
Generalist robot manipulators need to learn a wide variety of manipulation skills across
diverse environments. Current robot training pipelines rely on humans to provide kinesthetic …
diverse environments. Current robot training pipelines rely on humans to provide kinesthetic …