Unleashing large-scale video generative pre-training for visual robot manipulation

H Wu, Y **g, C Cheang, G Chen, J Xu, X Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Generative pre-trained models have demonstrated remarkable effectiveness in language
and vision domains by learning useful representations. In this paper, we extend the scope of …

Any-point trajectory modeling for policy learning

C Wen, X Lin, J So, K Chen, Q Dou, Y Gao… - arxiv preprint arxiv …, 2023 - arxiv.org
Learning from demonstration is a powerful method for teaching robots new skills, and having
more demonstration data often improves policy learning. However, the high cost of collecting …

Towards generalist robot learning from internet video: A survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arxiv preprint arxiv …, 2024 - arxiv.org
Scaling deep learning to massive, diverse internet data has yielded remarkably general
capabilities in visual and natural language understanding and generation. However, data …

Vista: A generalizable driving world model with high fidelity and versatile controllability

S Gao, J Yang, L Chen, K Chitta, Y Qiu… - arxiv preprint arxiv …, 2024 - arxiv.org
World models can foresee the outcomes of different actions, which is of paramount
importance for autonomous driving. Nevertheless, existing driving world models still have …

Learning to act from actionless videos through dense correspondences

PC Ko, J Mao, Y Du, SH Sun… - arxiv preprint arxiv …, 2023 - arxiv.org
In this work, we present an approach to construct a video-based robot policy capable of
reliably executing diverse tasks across different robots and environments from few video …

Gr-2: A generative video-language-action model with web-scale knowledge for robot manipulation

CL Cheang, G Chen, Y **g, T Kong, H Li, Y Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We present GR-2, a state-of-the-art generalist robot agent for versatile and generalizable
robot manipulation. GR-2 is first pre-trained on a vast number of Internet videos to capture …

Sora as an agi world model? a complete survey on text-to-video generation

J Cho, FD Puspitasari, S Zheng, J Zheng… - arxiv preprint arxiv …, 2024 - arxiv.org
The evolution of video generation from text, starting with animating MNIST numbers to
simulating the physical world with Sora, has progressed at a breakneck speed over the past …

Vision-language models as a source of rewards

K Baumli, S Baveja, F Behbahani, H Chan… - arxiv preprint arxiv …, 2023 - arxiv.org
Building generalist agents that can accomplish many goals in rich open-ended
environments is one of the research frontiers for reinforcement learning. A key limiting factor …

General flow as foundation affordance for scalable robot learning

C Yuan, C Wen, T Zhang, Y Gao - arxiv preprint arxiv:2401.11439, 2024 - arxiv.org
We address the challenge of acquiring real-world manipulation skills with a scalable
framework. We hold the belief that identifying an appropriate prediction target capable of …

[HTML][HTML] A practical roadmap to learning from demonstration for robotic manipulators in manufacturing

A Barekatain, H Habibi, H Voos - Robotics, 2024 - mdpi.com
This paper provides a structured and practical roadmap for practitioners to integrate learning
from demonstration (LfD) into manufacturing tasks, with a specific focus on industrial …