Video Prediction Policy: A Generalist Robot Policy with Predictive Visual Representations

Y Hu, Y Guo, P Wang, X Chen, YJ Wang… - ar** generalist policies capable of
performing multiple tasks. Typically, these policies utilize pre-trained vision encoders to …

Preference Alignment on Diffusion Model: A Comprehensive Survey for Image Generation and Editing

S Wu, X Si, C **ng, J Wang, G **, G Cheng… - arxiv preprint arxiv …, 2025 - arxiv.org
The integration of preference alignment with diffusion models (DMs) has emerged as a
transformative approach to enhance image generation and editing capabilities. Although …

VLABench: A Large-Scale Benchmark for Language-Conditioned Robotics Manipulation with Long-Horizon Reasoning Tasks

S Zhang, Z Xu, P Liu, X Yu, Y Li, Q Gao, Z Fei… - arxiv preprint arxiv …, 2024 - arxiv.org
General-purposed embodied agents are designed to understand the users' natural
instructions or intentions and act precisely to complete universal tasks. Recently, methods …

G3Flow: Generative 3D Semantic Flow for Pose-aware and Generalizable Object Manipulation

T Chen, Y Mu, Z Liang, Z Chen, S Peng, Q Chen… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advances in imitation learning for 3D robotic manipulation have shown promising
results with diffusion-based policies. However, achieving human-level dexterity requires …

You Only Teach Once: Learn One-Shot Bimanual Robotic Manipulation from Video Demonstrations

H Zhou, R Wang, Y Tai, Y Deng, G Liu, K Jia - arxiv preprint arxiv …, 2025 - arxiv.org
Bimanual robotic manipulation is a long-standing challenge of embodied intelligence due to
its characteristics of dual-arm spatial-temporal coordination and high-dimensional action …

SpatialVLA: Exploring Spatial Representations for Visual-Language-Action Model

D Qu, H Song, Q Chen, Y Yao, X Ye, Y Ding… - arxiv preprint arxiv …, 2025 - arxiv.org
In this paper, we claim that spatial understanding is the keypoint in robot manipulation, and
propose SpatialVLA to explore effective spatial representations for the robot foundation …