SPOT: SE (3) Pose Trajectory Diffusion for Object-Centric Manipulation

CC Hsu, B Wen, J Xu, Y Narang, X Wang, Y Zhu… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce SPOT, an object-centric imitation learning framework. The key idea is to
capture each task by an object-centric representation, specifically the SE (3) object pose …

One-shot imitation under mismatched execution

K Kedia, P Dan, A Chao, MA Pace… - arxiv preprint arxiv …, 2024 - arxiv.org
Human demonstrations as prompts are a powerful way to program robots to do long-horizon
manipulation tasks. However, translating these demonstrations into robot-executable actions …

Learning from Massive Human Videos for Universal Humanoid Pose Control

J Mao, S Zhao, S Song, T Shi, J Ye, M Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
Scalable learning of humanoid robots is crucial for their deployment in real-world
applications. While traditional approaches primarily rely on reinforcement learning or …

Motion Before Action: Diffusing Object Motion as Manipulation Condition

Y Su, X Zhan, H Fang, YL Li, C Lu, L Yang - arxiv preprint arxiv …, 2024 - arxiv.org
Inferring object motion representations from observations enhances the performance of
robotic manipulation tasks. This paper introduces a new paradigm for robot imitation …

ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions

T Souček, P Gatti, M Wray, I Laptev, D Damen… - arxiv preprint arxiv …, 2024 - arxiv.org
The goal of this work is to generate step-by-step visual instructions in the form of a sequence
of images, given an input image that provides the scene context and the sequence of textual …

Motion Tracks: A Unified Representation for Human-Robot Transfer in Few-Shot Imitation Learning

J Ren, P Sundaresan, D Sadigh, S Choudhury… - arxiv preprint arxiv …, 2025 - arxiv.org
Teaching robots to autonomously complete everyday tasks remains a challenge. Imitation
Learning (IL) is a powerful approach that imbues robots with skills via demonstrations, but is …

Zero-Shot Monocular Scene Flow Estimation in the Wild

Y Liang, A Badki, H Su, J Tompkin, O Gallo - arxiv preprint arxiv …, 2025 - arxiv.org
Large models have shown generalization across datasets for many low-level vision tasks,
like depth estimation, but no such general models exist for scene flow. Even though scene …

FLIP: Flow-Centric Generative Planning for General-Purpose Manipulation Tasks

C Gao, H Zhang, Z Xu, Z Cai, L Shao - arxiv preprint arxiv:2412.08261, 2024 - arxiv.org
We aim to develop a model-based planning framework for world models that can be scaled
with increasing model and data budgets for general-purpose manipulation tasks with only …

RoboPanoptes: The All-seeing Robot with Whole-body Dexterity

X Xu, D Bauer, S Song - arxiv preprint arxiv:2501.05420, 2025 - arxiv.org
We present RoboPanoptes, a capable yet practical robot system that achieves whole-body
dexterity through whole-body vision. Its whole-body dexterity allows the robot to utilize its …

Embodiment-Agnostic Action Planning via Object-Part Scene Flow

W Tang, JH Pan, W Zhan, J Zhou, H Yao… - arxiv preprint arxiv …, 2024 - arxiv.org
Observing that the key for robotic action planning is to understand the target-object motion
when its associated part is manipulated by the end effector, we propose to generate the 3D …