- Academic Search

S Levine, A Kumar, G Tucker, J Fu - arxiv preprint arxiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

Save Cite Cited by 2161 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] jmlr.org

A review of robot learning for manipulation: Challenges, representations, and algorithms

O Kroemer, S Niekum, G Konidaris - Journal of machine learning research, 2021 - jmlr.org

A key challenge in intelligent robotics is creating robots that are capable of directly
interacting with the world around them to achieve their goals. The last decade has seen …

Save Cite Cited by 447 Related articles All 18 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Rt-1: Robotics transformer for real-world control at scale

A Brohan, N Brown, J Carbajal, Y Chebotar… - arxiv preprint arxiv …, 2022 - arxiv.org

By transferring knowledge from large, diverse, task-agnostic datasets, modern machine
learning models can solve specific downstream tasks either zero-shot or with small task …

Save Cite Cited by 914 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] nematilab.info

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care

M Komorowski, LA Celi, O Badawi, AC Gordon… - Nature medicine, 2018 - nature.com

Sepsis is the third leading cause of death worldwide and the main cause of mortality in
hospitals,–, but the best treatment strategy remains uncertain. In particular, evidence …

Save Cite Cited by 1165 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Coindice: Off-policy confidence interval estimation

B Dai, O Nachum, Y Chow, L Li… - Advances in neural …, 2020 - proceedings.neurips.cc

We study high-confidence behavior-agnostic off-policy evaluation in reinforcement learning,
where the goal is to estimate a confidence interval on a target policy's value, given only …

Save Cite Cited by 94 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] sigcomm.org

Verifying learning-augmented systems

T Eliyahu, Y Kazak, G Katz, M Schapira - Proceedings of the 2021 ACM …, 2021 - dl.acm.org

The application of deep reinforcement learning (DRL) to computer and networked systems
has recently gained significant popularity. However, the obscurity of decisions by DRL …

Save Cite Cited by 67 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Universal off-policy evaluation

Y Chandak, S Niekum, B da Silva… - Advances in …, 2021 - proceedings.neurips.cc

When faced with sequential decision-making problems, it is often useful to be able to predict
what would happen if decisions were made using a new policy. Those predictions must …

Save Cite Cited by 57 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Learning when-to-treat policies

X Nie, E Brunskill, S Wager - Journal of the American Statistical …, 2021 - Taylor & Francis

Many applied decision-making problems have a dynamic component: The policymaker
needs not only to choose whom to treat, but also when to start which treatment. For example …

Save Cite Cited by 112 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Off-policy policy evaluation for sequential decisions under unobserved confounding

H Namkoong, R Keramati… - Advances in Neural …, 2020 - proceedings.neurips.cc

When observed decisions depend only on observed features, off-policy policy evaluation
(OPE) methods for sequential decision problems can estimate the performance of evaluation …

Save Cite Cited by 80 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

An instrumental variable approach to confounded off-policy evaluation

Y Xu, J Zhu, C Shi, S Luo… - … Conference on Machine …, 2023 - proceedings.mlr.press

Off-policy evaluation (OPE) aims to estimate the return of a target policy using some pre-
collected observational data generated by a potentially different behavior policy. In many …

Save Cite Cited by 17 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Bootstrap** with models: Confidence intervals for off-policy evaluation

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

A review of robot learning for manipulation: Challenges, representations, and algorithms

Rt-1: Robotics transformer for real-world control at scale

The artificial intelligence clinician learns optimal treatment strategies for sepsis in intensive care

Coindice: Off-policy confidence interval estimation

Verifying learning-augmented systems

Universal off-policy evaluation

Learning when-to-treat policies

Off-policy policy evaluation for sequential decisions under unobserved confounding

An instrumental variable approach to confounded off-policy evaluation