Q-learning: Theory and applications
J Clifton, E Laber - Annual Review of Statistics and Its …, 2020 - annualreviews.org
Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in
an infinite-horizon decision problem, now refers to a general class of reinforcement learning …
an infinite-horizon decision problem, now refers to a general class of reinforcement learning …
[HTML][HTML] Reinforcement learning for clinical decision support in critical care: comprehensive review
Background Decision support systems based on reinforcement learning (RL) have been
implemented to facilitate the delivery of personalized care. This paper aimed to provide a …
implemented to facilitate the delivery of personalized care. This paper aimed to provide a …
[КНИГА][B] Control systems and reinforcement learning
S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …
A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …
Sample-optimal parametric q-learning using linearly additive features
Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …
can linearly express the process's probabilistic transition model. We propose a parametric Q …
Online robust reinforcement learning with model uncertainty
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …
Deterministic policy gradient algorithms
In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …
with continuous actions. The deterministic policy gradient has a particularly appealing form …
[PDF][PDF] Playing atari with deep reinforcement learning
V Mnih - arxiv preprint arxiv:1312.5602, 2013 - people.engr.tamu.edu
We present the first deep learning model to successfully learn control policies directly from
high-dimensional sensory input using reinforcement learning. The model is a convolutional …
high-dimensional sensory input using reinforcement learning. The model is a convolutional …
Deep exploration via randomized value functions
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …
learning. This offers an elegant means for synthesizing statistically and computationally …
SBEED: Convergent reinforcement learning with nonlinear function approximation
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …
guarantees has remained a major open problem in reinforcement learning for decades. The …