- Academic Search

S Levine, A Kumar, G Tucker, J Fu - arxiv preprint arxiv:2005.01643, 2020 - arxiv.org

In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …

Tallenna Viittaa Viittausten määrä 2195 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A review of off-policy evaluation in reinforcement learning

M Uehara, C Shi, N Kallus - arxiv preprint arxiv:2212.06355, 2022 - arxiv.org

Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …

Tallenna Viittaa Viittausten määrä 78 Aiheeseen liittyviä artikkeleita Kaikki 2 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] nature.com

Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial

G Wang, X Liu, Z Ying, G Yang, Z Chen, Z Liu… - Nature Medicine, 2023 - nature.com

The personalized titration and optimization of insulin regimens for treatment of type 2
diabetes (T2D) are resource-demanding healthcare tasks. Here we propose a model-based …

Tallenna Viittaa Viittausten määrä 79 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Is pessimism provably efficient for offline rl?

Y **, Z Yang, Z Wang - International Conference on …, 2021 - proceedings.mlr.press

We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …

Tallenna Viittaa Viittausten määrä 461 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Challenges of real-world reinforcement learning

G Dulac-Arnold, D Mankowitz, T Hester - arxiv preprint arxiv:1904.12901, 2019 - arxiv.org

Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …

Tallenna Viittaa Viittausten määrä 709 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

O Nachum, Y Chow, B Dai, L Li - Advances in neural …, 2019 - proceedings.neurips.cc

In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …

Tallenna Viittaa Viittausten määrä 383 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Way off-policy batch deep reinforcement learning of implicit human preferences in dialog

N Jaques, A Ghandeharioun, JH Shen… - arxiv preprint arxiv …, 2019 - arxiv.org

Most deep reinforcement learning (RL) systems are not able to learn effectively from off-
policy data, especially if they cannot explore online in the environment. These are critical …

Tallenna Viittaa Viittausten määrä 369 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Batch policy learning under constraints

H Le, C Voloshin, Y Yue - International Conference on …, 2019 - proceedings.mlr.press

When learning policies for real-world domains, two important questions arise:(i) how to
efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate …

Tallenna Viittaa Viittausten määrä 370 Aiheeseen liittyviä artikkeleita Kaikki 14 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Provable benefits of actor-critic methods for offline reinforcement learning

A Zanette, MJ Wainwright… - Advances in neural …, 2021 - proceedings.neurips.cc

Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …

Tallenna Viittaa Viittausten määrä 143 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Doubly robust joint learning for recommendation on data missing not at random

X Wang, R Zhang, Y Sun, J Qi - International Conference on …, 2019 - proceedings.mlr.press

In recommender systems, usually the ratings of a user to most items are missing and a
critical problem is that the missing ratings are often missing not at random (MNAR) in reality …

Tallenna Viittaa Viittausten määrä 267 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

More robust doubly robust off-policy evaluation

Offline reinforcement learning: Tutorial, review, and perspectives on open problems

A review of off-policy evaluation in reinforcement learning

Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial

Is pessimism provably efficient for offline rl?

Challenges of real-world reinforcement learning

Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections

Way off-policy batch deep reinforcement learning of implicit human preferences in dialog

Batch policy learning under constraints

Provable benefits of actor-critic methods for offline reinforcement learning

Doubly robust joint learning for recommendation on data missing not at random