A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

A survey on causal reinforcement learning

Y Zeng, R Cai, F Sun, L Huang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
While reinforcement learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …

Sensitivity analysis of individual treatment effects: A robust conformal inference approach

Y **, Z Ren, EJ Candès - Proceedings of the National …, 2023 - National Acad Sciences
We propose a model-free framework for sensitivity analysis of individual treatment effects
(ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the …

Comparing causal frameworks: Potential outcomes, structural models, graphs, and abstractions

D Ibeling, T Icard - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The aim of this paper is to make clear and precise the relationship between the Rubin
causal model (RCM) and structural causal model (SCM) frameworks for causal inference …

Learning deep features in instrumental variable regression

L Xu, Y Chen, S Srinivasan, N de Freitas… - arxiv preprint arxiv …, 2020 - arxiv.org
Instrumental variable (IV) regression is a standard strategy for learning causal relationships
between confounded treatment and outcome variables from observational data by utilizing …

A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes

C Shi, M Uehara, J Huang… - … Conference on Machine …, 2022 - proceedings.mlr.press
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …

Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency

M Uehara, M Imaizumi, N Jiang, N Kallus… - arxiv preprint arxiv …, 2021 - arxiv.org
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement
learning using function approximation for marginal importance weights and $ q $-functions …

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Off-policy confidence interval estimation with confounded markov decision process

C Shi, J Zhu, Y Shen, S Luo, H Zhu… - Journal of the American …, 2024 - Taylor & Francis
This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …

Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes

A Bennett, N Kallus - Operations Research, 2024 - pubsonline.informs.org
In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …