Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Offline reinforcement learning: Tutorial, review, and perspectives on open problems
In this tutorial article, we aim to provide the reader with the conceptual tools needed to get
started on research on offline reinforcement learning algorithms: reinforcement learning …
started on research on offline reinforcement learning algorithms: reinforcement learning …
A review of off-policy evaluation in reinforcement learning
Reinforcement learning (RL) is one of the most vibrant research frontiers in machine
learning and has been recently applied to solve a number of challenging problems. In this …
learning and has been recently applied to solve a number of challenging problems. In this …
Optimized glycemic control of type 2 diabetes with reinforcement learning: a proof-of-concept trial
G Wang, X Liu, Z Ying, G Yang, Z Chen, Z Liu… - Nature Medicine, 2023 - nature.com
The personalized titration and optimization of insulin regimens for treatment of type 2
diabetes (T2D) are resource-demanding healthcare tasks. Here we propose a model-based …
diabetes (T2D) are resource-demanding healthcare tasks. Here we propose a model-based …
Is pessimism provably efficient for offline rl?
We study offline reinforcement learning (RL), which aims to learn an optimal policy based on
a dataset collected a priori. Due to the lack of further interactions with the environment …
a dataset collected a priori. Due to the lack of further interactions with the environment …
Challenges of real-world reinforcement learning
Reinforcement learning (RL) has proven its worth in a series of artificial domains, and is
beginning to show some successes in real-world scenarios. However, much of the research …
beginning to show some successes in real-world scenarios. However, much of the research …
Dualdice: Behavior-agnostic estimation of discounted stationary distribution corrections
In many real-world reinforcement learning applications, access to the environment is limited
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
to a fixed dataset, instead of direct (online) interaction with the environment. When using this …
Way off-policy batch deep reinforcement learning of implicit human preferences in dialog
Most deep reinforcement learning (RL) systems are not able to learn effectively from off-
policy data, especially if they cannot explore online in the environment. These are critical …
policy data, especially if they cannot explore online in the environment. These are critical …
Batch policy learning under constraints
When learning policies for real-world domains, two important questions arise:(i) how to
efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate …
efficiently use pre-collected off-policy, non-optimal behavior data; and (ii) how to mediate …
Provable benefits of actor-critic methods for offline reinforcement learning
Actor-critic methods are widely used in offline reinforcement learningpractice, but are not so
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
well-understood theoretically. We propose a newoffline actor-critic algorithm that naturally …
Doubly robust joint learning for recommendation on data missing not at random
In recommender systems, usually the ratings of a user to most items are missing and a
critical problem is that the missing ratings are often missing not at random (MNAR) in reality …
critical problem is that the missing ratings are often missing not at random (MNAR) in reality …