Advances and challenges in meta-learning: A technical review

A Vettoruzzo, MR Bouguelia… - IEEE transactions on …, 2024 - ieeexplore.ieee.org
Meta-learning empowers learning systems with the ability to acquire knowledge from
multiple tasks, enabling faster adaptation and generalization to new tasks. This review …

Mobile aloha: Learning bimanual mobile manipulation with low-cost whole-body teleoperation

Z Fu, TZ Zhao, C Finn - arxiv preprint arxiv:2401.02117, 2024 - arxiv.org
Imitation learning from human demonstrations has shown impressive performance in
robotics. However, most results focus on table-top manipulation, lacking the mobility and …

Affordances from human videos as a versatile representation for robotics

S Bahl, R Mendonca, L Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Building a robot that can understand and learn to interact by watching humans has inspired
several vision problems. However, despite some successful results on static datasets, it …

R3m: A universal visual representation for robot manipulation

S Nair, A Rajeswaran, V Kumar, C Finn… - arxiv preprint arxiv …, 2022 - arxiv.org
We study how visual representations pre-trained on diverse human video data can enable
data-efficient learning of downstream robotic manipulation tasks. Concretely, we pre-train a …

Open X-Embodiment: Robotic Learning Datasets and RT-X Models : Open X-Embodiment Collaboration0

A O'Neill, A Rehman, A Maddukuri… - … on Robotics and …, 2024 - ieeexplore.ieee.org
Large, high-capacity models trained on diverse datasets have shown remarkable successes
on efficiently tackling downstream applications. In domains from NLP to Computer Vision …

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Vatt: Transformers for multimodal self-supervised learning from raw video, audio and text

H Akbari, L Yuan, R Qian… - Advances in neural …, 2021 - proceedings.neurips.cc
We present a framework for learning multimodal representations from unlabeled data using
convolution-free Transformer architectures. Specifically, our Video-Audio-Text Transformer …

Mimicplay: Long-horizon imitation learning by watching human play

C Wang, L Fan, J Sun, R Zhang, L Fei-Fei, D Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
Imitation learning from human demonstrations is a promising paradigm for teaching robots
manipulation skills in the real world. However, learning complex long-horizon tasks often …

Language-driven representation learning for robotics

S Karamcheti, S Nair, AS Chen, T Kollar, C Finn… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent work in visual representation learning for robotics demonstrates the viability of
learning from large video datasets of humans performing everyday tasks. Leveraging …

Dexcap: Scalable and portable mocap data collection system for dexterous manipulation

C Wang, H Shi, W Wang, R Zhang, L Fei-Fei… - arxiv preprint arxiv …, 2024 - arxiv.org
Imitation learning from human hand motion data presents a promising avenue for imbuing
robots with human-like dexterity in real-world manipulation tasks. Despite this potential …