A policy-guided imitation approach for offline reinforcement learning

H Xu, L Jiang, L Jianxiong… - Advances in Neural …, 2022 - proceedings.neurips.cc
Offline reinforcement learning (RL) methods can generally be categorized into two types: RL-
based and Imitation-based. RL-based methods could in principle enjoy out-of-distribution …

Is model ensemble necessary? model-based rl via a single model with lipschitz regularized value function

R Zheng, X Wang, H Xu, F Huang - arxiv preprint arxiv:2302.01244, 2023 - arxiv.org
Probabilistic dynamics model ensemble is widely used in existing model-based
reinforcement learning methods as it outperforms a single dynamics model in both …

Diffusion imitation from observation

BR Huang, CK Yang, CM Lai, DJ Wu… - arxiv preprint arxiv …, 2024 - arxiv.org
Learning from observation (LfO) aims to imitate experts by learning from state-only
demonstrations without requiring action labels. Existing adversarial imitation learning …

Towards Generalist Robot Learning from Internet Video: A Survey

R McCarthy, DCH Tan, D Schmidt, F Acero… - arxiv preprint arxiv …, 2024 - arxiv.org
This survey presents an overview of methods for learning from video (LfV) in the context of
reinforcement learning (RL) and robotics. We focus on methods capable of scaling to large …

Imitation learning from observation with automatic discount scheduling

Y Liu, W Dong, Y Hu, C Wen, ZH Yin, C Zhang… - arxiv preprint arxiv …, 2023 - arxiv.org
Humans often acquire new skills through observation and imitation. For robotic agents,
learning from the plethora of unlabeled video demonstration data available on the Internet …

DGTRL: Deep graph transfer reinforcement learning method based on fusion of knowledge and data

G Chen, J Qi, Y Gao, X Zhu, Z Dong, Y Sun - Information Sciences, 2024 - Elsevier
Deep reinforcement learning has shown promising application effects in many fields.
However, issues such as low sample efficiency and weak knowledge transfer and …

Learning rational subgoals from demonstrations and instructions

Z Luo, J Mao, J Wu, T Lozano-Pérez… - Proceedings of the …, 2023 - ojs.aaai.org
We present a framework for learning useful subgoals that support efficient long-term
planning to achieve novel goals. At the core of our framework is a collection of rational …

NOLO: Navigate Only Look Once

B Zhou, J Wang, Z Lu - arxiv preprint arxiv:2408.01384, 2024 - arxiv.org
The in-context learning ability of Transformer models has brought new possibilities to visual
navigation. In this paper, we focus on the video navigation setting, where an in-context …

GAN-MPC: Training Model Predictive Controllers with Parameterized Cost Functions using Demonstrations from Non-identical Experts

R Burnwal, A Santara, NP Bhatt, B Ravindran… - arxiv preprint arxiv …, 2023 - arxiv.org
Model predictive control (MPC) is a popular approach for trajectory optimization in practical
robotics applications. MPC policies can optimize trajectory parameters under kinodynamic …

[CITATION][C] 模仿学**综述: 传统与新进展

张超, 白文松, 杜歆, 柳伟杰, 周晨浩, 钱徽 - **图象图形学报, 2023