Open x-embodiment: Robotic learning datasets and rt-x models

A O'Neill, A Rehman, A Gupta, A Maddukuri… - arxiv preprint arxiv …, 2023 - arxiv.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0

A O'Neill, A Rehman, A Maddukuri… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Diffusion reward: Learning rewards via conditional video diffusion

T Huang, G Jiang, Y Ze, H Xu - European Conference on Computer Vision, 2024 - Springer
Learning rewards from expert videos offers an affordable and effective solution to specify the
intended behaviors for reinforcement learning (RL) tasks. In this work, we propose Diffusion …

[HTML][HTML] A practical roadmap to learning from demonstration for robotic manipulators in manufacturing

A Barekatain, H Habibi, H Voos - Robotics, 2024 - mdpi.com
This paper provides a structured and practical roadmap for practitioners to integrate learning
from demonstration (LfD) into manufacturing tasks, with a specific focus on industrial …

Integrating reinforcement learning with foundation models for autonomous robotics: Methods and perspectives

A Moroncelli, V Soni, AA Shahid, M Maccarini… - arxiv preprint arxiv …, 2024 - arxiv.org
Foundation models (FMs), large deep learning models pre-trained on vast, unlabeled
datasets, exhibit powerful capabilities in understanding complex patterns and generating …

View: Visual imitation learning with waypoints

A Jonnavittula, S Parekh, DP Losey - Autonomous Robots, 2025 - Springer
Robots can use visual imitation learning (VIL) to learn manipulation tasks from video
demonstrations. However, translating visual observations into actionable robot policies is …

Vividex: Learning vision-based dexterous manipulation from human videos

Z Chen, S Chen, E Arlaud, I Laptev… - arxiv preprint arxiv …, 2024 - arxiv.org
In this work, we aim to learn a unified vision-based policy for multi-fingered robot hands to
manipulate a variety of objects in diverse poses. Though prior work has shown benefits of …

Open x-embodiment: Robotic learning datasets and RT-x models

Q Vuong, S Levine, HR Walke, K Pertsch… - … for Scalable Skill …, 2023 - openreview.net
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

Human-oriented representation learning for robotic manipulation

M Huo, M Ding, C Xu, T Tian, X Zhu, Y Mu… - arxiv preprint arxiv …, 2023 - arxiv.org
Humans inherently possess generalizable visual representations that empower them to
efficiently explore and interact with the environments in manipulation tasks. We advocate …

Giving robots a hand: Learning generalizable manipulation with eye-in-hand human video demonstrations

MJ Kim, J Wu, C Finn - arxiv preprint arxiv:2307.05959, 2023 - arxiv.org
Eye-in-hand cameras have shown promise in enabling greater sample efficiency and
generalization in vision-based robotic manipulation. However, for robotic imitation, it is still …