Offline reinforcement learning: Tutorial, review, and perspectives on open problems

S Levine, A Kumar, G Tucker, J Fu - arxiv preprint arxiv:2005.01643, 2020 - arxiv.org
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

Neural approaches to conversational AI

J Gao, M Galley, L Li - The 41st international ACM SIGIR conference on …, 2018 - dl.acm.org
This tutorial surveys neural approaches to conversational AI that were developed in the last
few years. We group conversational systems into three categories:(1) question answering …

Conservative q-learning for offline reinforcement learning

A Kumar, A Zhou, G Tucker… - Advances in neural …, 2020 - proceedings.neurips.cc
Effectively leveraging large, previously collected datasets in reinforcement learn-ing (RL) is
a key challenge for large-scale real-world applications. Offline RL algorithms promise to …

Is pessimism provably efficient for offline rl?

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Behavior regularized offline reinforcement learning

Y Wu, G Tucker, O Nachum - arxiv preprint arxiv:1911.11361, 2019 - arxiv.org
In reinforcement learning (RL) research, it is common to assume access to direct online
interactions with the environment. However in many real-world applications, access to the …

Offline reinforcement learning with realizability and single-policy concentrability

W Zhan, B Huang, A Huang… - … on Learning Theory, 2022 - proceedings.mlr.press
Sample-efficiency guarantees for offline reinforcement learning (RL) often rely on strong
assumptions on both the function classes (eg, Bellman-completeness) and the data …

Information-theoretic considerations in batch reinforcement learning

J Chen, N Jiang - International conference on machine …, 2019 - proceedings.mlr.press
Value-function approximation methods that operate in batch mode have foundational
importance to reinforcement learning (RL). Finite sample guarantees for these methods …

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Batch policy learning under constraints

H Le, C Voloshin, Y Yue - International Conference on …, 2019 - proceedings.mlr.press
When learning policies for real-world domains, two important questions arise:(i) how to
efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate …