A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

AI and personalization

O Rafieian, H Yoganarasimhan - Artificial Intelligence in Marketing, 2023 - emerald.com
This chapter reviews the recent developments at the intersection of personalization and AI in
marketing and related fields. We provide a formal definition of personalized policy and …

Settling the sample complexity of model-based offline reinforcement learning

G Li, L Shi, Y Chen, Y Chi, Y Wei - The Annals of Statistics, 2024 - projecteuclid.org
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …

Double reinforcement learning for efficient off-policy evaluation in markov decision processes

N Kallus, M Uehara - Journal of Machine Learning Research, 2020 - jmlr.org
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …

Minimax weight and q-function learning for off-policy evaluation

M Uehara, J Huang, N Jiang - International Conference on …, 2020 - proceedings.mlr.press
We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …

DoubleML-an object-oriented implementation of double machine learning in python

P Bach, V Chernozhukov, MS Kurz… - Journal of Machine …, 2022 - jmlr.org
DoubleML is an open-source Python library implementing the double machine learning
framework of Chernozhukov et al.(2018) for a variety of causal models. It contains …

Empirical study of off-policy policy evaluation for reinforcement learning

C Voloshin, HM Le, N Jiang, Y Yue - arxiv preprint arxiv:1911.06854, 2019 - arxiv.org
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …

Off-policy evaluation via the regularized lagrangian

M Yang, O Nachum, B Dai, L Li… - Advances in Neural …, 2020 - proceedings.neurips.cc
The recently proposed distribution correction estimation (DICE) family of estimators has
advanced the state of the art in off-policy evaluation from behavior-agnostic data. While …

Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics

W Yang, L Zhang, Z Zhang - The Annals of Statistics, 2022 - projecteuclid.org
Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …

Reinforcement learning via fenchel-rockafellar duality

O Nachum, B Dai - arxiv preprint arxiv:2001.01866, 2020 - arxiv.org
We review basic concepts of convex duality, focusing on the very general and supremely
useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a …