A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
AI and personalization
This chapter reviews the recent developments at the intersection of personalization and AI in
marketing and related fields. We provide a formal definition of personalized policy and …
marketing and related fields. We provide a formal definition of personalized policy and …
Settling the sample complexity of model-based offline reinforcement learning
Settling the sample complexity of model-based offline reinforcement learning Page 1 The
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Annals of Statistics 2024, Vol. 52, No. 1, 233–260 https://doi.org/10.1214/23-AOS2342 © …
Double reinforcement learning for efficient off-policy evaluation in markov decision processes
Off-policy evaluation (OPE) in reinforcement learning allows one to evaluate novel decision
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
policies without needing to conduct exploration, which is often costly or otherwise infeasible …
Minimax weight and q-function learning for off-policy evaluation
We provide theoretical investigations into off-policy evaluation in reinforcement learning
using function approximators for (marginalized) importance weights and value functions. Our …
using function approximators for (marginalized) importance weights and value functions. Our …
DoubleML-an object-oriented implementation of double machine learning in python
DoubleML is an open-source Python library implementing the double machine learning
framework of Chernozhukov et al.(2018) for a variety of causal models. It contains …
framework of Chernozhukov et al.(2018) for a variety of causal models. It contains …
Empirical study of off-policy policy evaluation for reinforcement learning
We offer an experimental benchmark and empirical study for off-policy policy evaluation
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …
(OPE) in reinforcement learning, which is a key problem in many safety critical applications …
Off-policy evaluation via the regularized lagrangian
The recently proposed distribution correction estimation (DICE) family of estimators has
advanced the state of the art in off-policy evaluation from behavior-agnostic data. While …
advanced the state of the art in off-policy evaluation from behavior-agnostic data. While …
Toward theoretical understandings of robust markov decision processes: Sample complexity and asymptotics
Toward theoretical understandings of robust Markov decision processes: Sample
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …
complexity and asymptotics Page 1 The Annals of Statistics 2022, Vol. 50, No. 6, 3223–3248 …
Reinforcement learning via fenchel-rockafellar duality
We review basic concepts of convex duality, focusing on the very general and supremely
useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a …
useful Fenchel-Rockafellar duality. We summarize how this duality may be applied to a …