A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Off-policy evaluation for large action spaces via conjunct effect modeling
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …
spaces where conventional importance-weighting approaches suffer from excessive …
Coindice: Off-policy confidence interval estimation
We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …
where the goal is to estimate a confidence interval on a target policy's value, given only …
Near-optimal offline reinforcement learning via double variance reduction
We consider the problem of offline reinforcement learning (RL)---a well-motivated setting of
RL that aims at policy optimization using only historical data. Despite its wide applicability …
RL that aims at policy optimization using only historical data. Despite its wide applicability …
Minimax value interval for off-policy evaluation and policy optimization
We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …
marginalized importance weights. Despite that they hold promises of overcoming the …
Universal off-policy evaluation
When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …
what would happen if decisions were made using a new policy. Those predictions must …
Instabilities of offline rl with pre-trained neural representation
In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn)
policies in scenarios where the data are collected from a distribution that substantially differs …
policies in scenarios where the data are collected from a distribution that substantially differs …
Importance sampling techniques for policy optimization
How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …
Flexible option learning
Temporal abstraction in reinforcement learning (RL), offers the promise of improving
generalization and knowledge transfer in complex environments, by propagating information …
generalization and knowledge transfer in complex environments, by propagating information …
Doubly robust bias reduction in infinite horizon off-policy estimation
Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively
large variance of typical importance sampling (IS) estimators. Recently, Liu et al.(2018a) …
large variance of typical importance sampling (IS) estimators. Recently, Liu et al.(2018a) …