Diffusion transformer policy
Recent large visual-language action models pretrained on diverse robot datasets have
demonstrated the potential for generalizing to new environments with a few in-domain data …
demonstrated the potential for generalizing to new environments with a few in-domain data …
Moto: Latent motion token as the bridging language for robot manipulation
Recent developments in Large Language Models pre-trained on extensive corpora have
shown significant success in various natural language processing tasks with minimal fine …
shown significant success in various natural language processing tasks with minimal fine …
VDT-Auto: End-to-end Autonomous Driving with VLM-Guided Diffusion Transformers
Z Guo, K Gubernatorov, S Asfaw, Z Yagudin… - arxiv preprint arxiv …, 2025 - arxiv.org
In autonomous driving, dynamic environment and corner cases pose significant challenges
to the robustness of ego vehicle's decision-making. To address these challenges …
to the robustness of ego vehicle's decision-making. To address these challenges …
Towards Fusing Point Cloud and Visual Representations for Imitation Learning
Learning for manipulation requires using policies that have access to rich sensory
information such as point clouds or RGB images. Point clouds efficiently capture geometric …
information such as point clouds or RGB images. Point clouds efficiently capture geometric …
ManiTrend: Bridging Future Generation and Action Prediction with 3D Flow for Robotic Manipulation
Y He, Q Nie - arxiv preprint arxiv:2502.10028, 2025 - arxiv.org
Language-conditioned manipulation is a vital but challenging robotic task due to the high-
level abstraction of language. To address this, researchers have sought improved goal …
level abstraction of language. To address this, researchers have sought improved goal …