Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arxiv preprint arxiv:2005.01643, 2020 - arxiv.org
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial

G Wang, X Liu, Z Ying, G Yang, Z Chen, Z Liu… - Nature Medicine, 2023 - nature.com
The personalized titration and optimization of insulin regimens for treatment of type 2
diabetes (T2D) are resource-demanding healthcare tasks. Here we propose a model-based …

Is pessimism provably efficient for offline rl?

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Challenges of real-world reinforcement learning

G Dulac-Arnold, D Mankowitz, T Hester - arxiv preprint arxiv:1904.12901, 2019 - arxiv.org
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

Way off-policy batch deep reinforcement learning of implicit human preferences in dialog

N Jaques, A Ghandeharioun, JH Shen… - arxiv preprint arxiv …, 2019 - arxiv.org
Most deep reinforcement learning (RL) systems are not able to learn effectively from off-
policy data, especially if they cannot explore online in the environment. These are critical …

Batch policy learning under constraints

H Le, C Voloshin, Y Yue - International Conference on …, 2019 - proceedings.mlr.press
When learning policies for real-world domains, two important questions arise:(i) how to
efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate …

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Doubly robust joint learning for recommendation on data missing not at random

X Wang, R Zhang, Y Sun, J Qi - International Conference on …, 2019 - proceedings.mlr.press
In recommender systems, usually the ratings of a user to most items are missing and a
critical problem is that the missing ratings are often missing not at random (MNAR) in reality …