Mutex: Learning unified policies from multimodal task specifications

R Shah, R Martín-Martín, Y Zhu - arxiv preprint arxiv:2309.14320, 2023 - arxiv.org
Humans use different modalities, such as speech, text, images, videos, etc., to communicate
their intent and goals with teammates. For robots to become better assistants, we aim to …

Quest: Self-supervised skill abstractions for learning continuous control

A Mete, H Xue, A Wilcox, Y Chen… - Advances in Neural …, 2025 - proceedings.neurips.cc
Generalization capabilities, or rather a lack thereof, is one of the most important unsolved
problems in the field of robot learning, and while several large scale efforts have set out to …

Learning generalizable manipulation policies with object-centric 3d representations

Y Zhu, Z Jiang, P Stone, Y Zhu - arxiv preprint arxiv:2310.14386, 2023 - arxiv.org
We introduce GROOT, an imitation learning method for learning robust policies with object-
centric and 3D priors. GROOT builds policies that generalize beyond their initial training …

Bootstap: Bootstrapped training for tracking-any-point

C Doersch, P Luc, Y Yang, D Gokay… - Proceedings of the …, 2024 - openaccess.thecvf.com
To endow models with greater understanding of physics and motion, it is useful to enable
them to perceive how solid surfaces move and deform in real scenes. This can be formalized …

Robot utility models: General policies for zero-shot deployment in new environments

H Etukuru, N Naka, Z Hu, S Lee, J Mehu… - arxiv preprint arxiv …, 2024 - arxiv.org
Robot models, particularly those trained with large amounts of data, have recently shown a
plethora of real-world manipulation and navigation capabilities. Several independent efforts …

Humanoid locomotion and manipulation: Current progress and challenges in control, planning, and learning

Z Gu, J Li, W Shen, W Yu, Z **e, S McCrory… - arxiv preprint arxiv …, 2025 - arxiv.org
Humanoid robots have great potential to perform various human-level skills. These skills
involve locomotion, manipulation, and cognitive capabilities. Driven by advances in machine …

Lotus: Continual imitation learning for robot manipulation through unsupervised skill discovery

W Wan, Y Zhu, R Shah, Y Zhu - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
We introduce LOTUS, a continual imitation learning algorithm that empowers a physical
robot to continuously and efficiently learn to solve new manipulation tasks throughout its …

Deep generative models in robotics: A survey on learning from multimodal demonstrations

J Urain, A Mandlekar, Y Du, M Shafiullah, D Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from Demonstrations, the field that proposes to learn robot behavior models from
data, is gaining popularity with the emergence of deep generative models. Although the …

Fast: Efficient action tokenization for vision-language-action models

K Pertsch, K Stachowicz, B Ichter, D Driess… - arxiv preprint arxiv …, 2025 - arxiv.org
Autoregressive sequence models, such as Transformer-based vision-language action (VLA)
policies, can be tremendously effective for capturing complex and generalizable robotic …

DynaMo: In-Domain Dynamics Pretraining for Visuo-Motor Control

Z Cui, H Pan, A Iyer, S Haldar… - Advances in Neural …, 2025 - proceedings.neurips.cc
Imitation learning has proven to be a powerful tool for training complex visuo-motor policies.
However, current methods often require hundreds to thousands of expert demonstrations to …