Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - ar** error reduction
A Kumar, J Fu, M Soh, G Tucker… - Advances in neural …, 2019‏ - proceedings.neurips.cc
Off-policy reinforcement learning aims to leverage experience collected from prior policies
for sample-efficient learning. However, in practice, commonly used off-policy approximate …

Advantage-weighted regression: Simple and scalable off-policy reinforcement learning

XB Peng, A Kumar, G Zhang, S Levine - arxiv preprint arxiv:1910.00177, 2019‏ - arxiv.org
In this paper, we aim to develop a simple and scalable reinforcement learning algorithm that
uses standard supervised learning methods as subroutines. Our goal is an algorithm that …

Revisiting fundamentals of experience replay

W Fedus, P Ramachandran… - International …, 2020‏ - proceedings.mlr.press
Experience replay is central to off-policy algorithms in deep reinforcement learning (RL), but
there remain significant gaps in our understanding. We therefore present a systematic and …

When should we prefer offline reinforcement learning over behavioral cloning?

A Kumar, J Hong, A Singh, S Levine - arxiv preprint arxiv:2204.05618, 2022‏ - arxiv.org
Offline reinforcement learning (RL) algorithms can acquire effective policies by utilizing
previously collected experience, without any online interaction. It is widely understood that …

Datasets and benchmarks for offline safe reinforcement learning

Z Liu, Z Guo, H Lin, Y Yao, J Zhu, Z Cen, H Hu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
This paper presents a comprehensive benchmarking suite tailored to offline safe
reinforcement learning (RL) challenges, aiming to foster progress in the development and …