A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
A survey on causal reinforcement learning
While reinforcement learning (RL) achieves tremendous success in sequential decision-
making problems of many domains, it still faces key challenges of data inefficiency and the …
making problems of many domains, it still faces key challenges of data inefficiency and the …
Sensitivity analysis of individual treatment effects: A robust conformal inference approach
We propose a model-free framework for sensitivity analysis of individual treatment effects
(ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the …
(ITEs), building upon ideas from conformal inference. For any unit, our procedure reports the …
Comparing causal frameworks: Potential outcomes, structural models, graphs, and abstractions
The aim of this paper is to make clear and precise the relationship between the Rubin
causal model (RCM) and structural causal model (SCM) frameworks for causal inference …
causal model (RCM) and structural causal model (SCM) frameworks for causal inference …
Learning deep features in instrumental variable regression
Instrumental variable (IV) regression is a standard strategy for learning causal relationships
between confounded treatment and outcome variables from observational data by utilizing …
between confounded treatment and outcome variables from observational data by utilizing …
A minimax learning approach to off-policy evaluation in confounded partially observable markov decision processes
We consider off-policy evaluation (OPE) in Partially Observable Markov Decision Processes
(POMDPs), where the evaluation policy depends only on observable variables and the …
(POMDPs), where the evaluation policy depends only on observable variables and the …
Finite sample analysis of minimax offline reinforcement learning: Completeness, fast rates and first-order efficiency
We offer a theoretical characterization of off-policy evaluation (OPE) in reinforcement
learning using function approximation for marginal importance weights and $ q $-functions …
learning using function approximation for marginal importance weights and $ q $-functions …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Off-policy confidence interval estimation with confounded markov decision process
This article is concerned with constructing a confidence interval for a target policy's value
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
offline based on a pre-collected observational data in infinite horizon settings. Most of the …
Proximal reinforcement learning: Efficient off-policy evaluation in partially observed markov decision processes
In applications of offline reinforcement learning to observational data, such as in healthcare
or education, a general concern is that observed actions might be affected by unobserved …
or education, a general concern is that observed actions might be affected by unobserved …