Learning by watching: A review of video-based learning approaches for robot manipulation

C Eze, C Crick - ar**
Y Zha, S Bhambri, L Guan - 2021 IEEE/RSJ International …, 2021 - ieeexplore.ieee.org
Conventional works that learn gras** affordance from demonstrations need to explicitly
predict gras** configurations, such as gripper approaching angles or gras** preshapes …

Perception Stitching: Zero-Shot Perception Encoder Transfer for Visuomotor Robot Policies

P Jian, E Lee, Z Bell, MM Zavlanos, B Chen - arxiv preprint arxiv …, 2024 - arxiv.org
Vision-based imitation learning has shown promising capabilities of endowing robots with
various motion skills given visual observation. However, current visuomotor policies fail to …

Fov-net: Field-of-view extrapolation using self-attention and uncertainty

L Ma, S Georgoulis, X Jia… - IEEE Robotics and …, 2021 - ieeexplore.ieee.org
The ability to make educated predictions about their surroundings, and associate them with
certain confidence, is important for intelligent systems, like autonomous vehicles and robots …

Multi-view contrastive learning from demonstrations

A Correia, LA Alexandre - 2022 Sixth IEEE International …, 2022 - ieeexplore.ieee.org
This paper presents a framework for learning visual representations from unlabeled video
demonstrations captured from multiple viewpoints. We show that these representations are …

Attentive One-Shot Meta-Imitation Learning From Visual Demonstration

V Bhutani, A Majumder, M Vankadari… - … on Robotics and …, 2022 - ieeexplore.ieee.org
The ability to apply a previously-learned skill (eg, pushing) to a new task (context or object)
is an important requirement for new-age robots. An attempt is made to solve this problem in …

Perceiving, Planning, Acting, and Self-Explaining: A Cognitive Quartet with Four Neural Networks

Y Zha - 2022 - search.proquest.com
Learning to accomplish complex tasks may require a tight coupling among different levels of
cognitive functions or components, like perception, acting, planning, and self-explaining …

Contrastive Learning from Demonstrations

A Correia, LA Alexandre - arxiv preprint arxiv:2201.12813, 2022 - arxiv.org
This paper presents a framework for learning visual representations from unlabeled video
demonstrations captured from multiple viewpoints. We show that these representations are …

Understanding Manipulation Contexts by Vision and Language for Robotic Vision

C Jiang - 2021 - era.library.ualberta.ca
Abstract In Activities of Daily Living (ADLs), humans perform thousands of arm and hand
object manipulation tasks, such as picking, pouring and drinking a drink. Interpreting such …