Diffusion transformer policy

Z Hou, T Zhang, Y **ong, H Pu, C Zhao, R Tong… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent large visual-language action models pretrained on diverse robot datasets have
demonstrated the potential for generalizing to new environments with a few in-domain data …

Moto: Latent motion token as the bridging language for robot manipulation

Y Chen, Y Ge, Y Li, Y Ge, M Ding, Y Shan… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent developments in Large Language Models pre-trained on extensive corpora have
shown significant success in various natural language processing tasks with minimal fine …

VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers

Z Guo, K Gubernatorov, S Asfaw, Z Yagudin… - arxiv preprint arxiv …, 2025 - arxiv.org
In autonomous driving, dynamic environment and corner cases pose significant challenges
to the robustness of ego vehicle's decision-making. To address these challenges …

Towards Fusing Point Cloud and Visual Representations for Imitation Learning

A Donat, X Jia, X Huang, A Taranovic… - arxiv preprint arxiv …, 2025 - arxiv.org
Learning for manipulation requires using policies that have access to rich sensory
information such as point clouds or RGB images. Point clouds efficiently capture geometric …

ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation

Y He, Q Nie - arxiv preprint arxiv:2502.10028, 2025 - arxiv.org
Language-conditioned manipulation is a vital but challenging robotic task due to the high-
level abstraction of language. To address this, researchers have sought improved goal …