[HTML][HTML] A survey of robot intelligence with large language models

H Jeong, H Lee, C Kim, S Shin - Applied Sciences, 2024 - mdpi.com
Since the emergence of ChatGPT, research on large language models (LLMs) has actively
progressed across various fields. LLMs, pre-trained on vast text datasets, have exhibited …

Video language planning

Y Du, M Yang, P Florence, F **a, A Wahid… - arxiv preprint arxiv …, 2023 - arxiv.org
We are interested in enabling visual planning for complex long-horizon tasks in the space of
generated videos and language, leveraging recent advances in large generative models …

Chaineddiffuser: Unifying trajectory diffusion and keypose prediction for robotic manipulation

Z **an, N Gkanatsios, T Gervet, TW Ke… - … Annual Conference on …, 2023 - openreview.net
We present ChainedDiffuser, a policy architecture that unifies action keypose prediction and
trajectory diffusion generation for learning robot manipulation from demonstrations. Our …

Shelving, stacking, hanging: Relational pose diffusion for multi-modal rearrangement

A Simeonov, A Goyal, L Manuelli, L Yen-Chen… - arxiv preprint arxiv …, 2023 - arxiv.org
We propose a system for rearranging objects in a scene to achieve a desired object-scene
placing relationship, such as a book inserted in an open slot of a bookshelf. The pipeline …

Can pre-trained text-to-image models generate visual goals for reinforcement learning?

J Gao, K Hu, G Xu, H Xu - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Pre-trained text-to-image generative models can produce diverse, semantically rich, and
realistic images from natural language descriptions. Compared with language, images …

Sg-bot: Object rearrangement via coarse-to-fine robotic imagination on scene graphs

G Zhai, X Cai, D Huang, Y Di… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Object rearrangement is pivotal in robotic-environment interactions, representing a
significant capability in embodied AI. In this paper, we present SG-Bot, a novel …

Act3d: Infinite resolution action detection transformer for robotic manipulation

T Gervet, Z **an, N Gkanatsios… - arxiv preprint arxiv …, 2023 - arxiv.org
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Act3d: 3d feature field transformers for multi-task robotic manipulation

T Gervet, Z **an, N Gkanatsios… - 7th Annual Conference …, 2023 - openreview.net
3D perceptual representations are well suited for robot manipulation as they easily encode
occlusions and simplify spatial reasoning. Many manipulation tasks require high spatial …

Deep generative models in robotics: A survey on learning from multimodal demonstrations

J Urain, A Mandlekar, Y Du, M Shafiullah, D Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …

Gen2sim: Scaling up robot learning in simulation with generative models

P Katara, Z **an, K Fragkiadaki - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Generalist robot manipulators need to learn a wide variety of manipulation skills across
diverse environments. Current robot training pipelines rely on humans to provide kinesthetic …