- Academic Search

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Save Cite Cited by 69 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Off-policy evaluation for large action spaces via conjunct effect modeling

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press

We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

Save Cite Cited by 20 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

Save Cite Cited by 94 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Near-optimal offline reinforcement learning via double variance reduction

M Yin, Y Bai, YX Wang - Advances in neural information …, 2021 - proceedings.neurips.cc

We consider the problem of offline reinforcement learning (RL)---a well-motivated setting of
RL that aims at policy optimization using only historical data. Despite its wide applicability …

Save Cite Cited by 75 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Minimax value interval for off-policy evaluation and policy optimization

N Jiang, J Huang - Advances in Neural Information …, 2020 - proceedings.neurips.cc

We study minimax methods for off-policy evaluation (OPE) using value functions and
marginalized importance weights. Despite that they hold promises of overcoming the …

Save Cite Cited by 86 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Save Cite Cited by 57 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Instabilities of offline rl with pre-trained neural representation

R Wang, Y Wu, R Salakhutdinov… - … on Machine Learning, 2021 - proceedings.mlr.press

In offline reinforcement learning (RL), we seek to utilize offline data to evaluate (or learn)
policies in scenarios where the data are collected from a distribution that substantially differs …

Save Cite Cited by 54 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

Importance sampling techniques for policy optimization

AM Metelli, M Papini, N Montali, M Restelli - Journal of Machine Learning …, 2020 - jmlr.org

How can we effectively exploit the collected samples when solving a continuous control task
with Reinforcement Learning? Recent results have empirically demonstrated that multiple …

Save Cite Cited by 61 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Flexible option learning

M Klissarov, D Precup - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Temporal abstraction in reinforcement learning (RL), offers the promise of improving
generalization and knowledge transfer in complex environments, by propagating information …

Save Cite Cited by 35 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Doubly robust bias reduction in infinite horizon off-policy estimation

Z Tang, Y Feng, L Li, D Zhou, Q Liu - arxiv preprint arxiv:1910.07186, 2019 - arxiv.org

Infinite horizon off-policy policy evaluation is a highly challenging task due to the excessively
large variance of typical importance sampling (IS) estimators. Recently, Liu et al.(2018a) …

Save Cite Cited by 76 Related articles All 8 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Understanding the curse of horizon in off-policy evaluation via conditional importance sampling

A review of off-policy evaluation in reinforcement learning

Off-policy evaluation for large action spaces via conjunct effect modeling

Coindice: Off-policy confidence interval estimation

Near-optimal offline reinforcement learning via double variance reduction

Minimax value interval for off-policy evaluation and policy optimization

Universal off-policy evaluation

Instabilities of offline rl with pre-trained neural representation

Importance sampling techniques for policy optimization

Flexible option learning

Doubly robust bias reduction in infinite horizon off-policy estimation