Q-learning: Theory and applications

J Clifton, E Laber - Annual Review of Statistics and Its …, 2020 - annualreviews.org
Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in
an infinite-horizon decision problem, now refers to a general class of reinforcement learning …

[HTML][HTML] Reinforcement learning for clinical decision support in critical care: comprehensive review

S Liu, KC See, KY Ngiam, LA Celi, X Sun… - Journal of medical Internet …, 2020 - jmir.org
Background Decision support systems based on reinforcement learning (RL) have been
implemented to facilitate the delivery of personalized care. This paper aimed to provide a …

[КНИГА][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com
A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM
This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press
Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc
Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris… - International …, 2014 - proceedings.mlr.press
In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …

[PDF][PDF] Playing atari with deep reinforcement learning

V Mnih - arxiv preprint arxiv:1312.5602, 2013 - people.engr.tamu.edu
We present the first deep learning model to successfully learn control policies directly from
high-dimensional sensory input using reinforcement learning. The model is a convolutional …

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org
We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L **ao, N He… - International …, 2018 - proceedings.mlr.press
When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …