Академия Google

J Clifton, E Laber - Annual Review of Statistics and Its …, 2020 - annualreviews.org

Q-learning, originally an incremental algorithm for estimating an optimal decision strategy in
an infinite-horizon decision problem, now refers to a general class of reinforcement learning …

Сохранить Цитировать Цитируется: 366 Похожие статьи Все версии статьи (4)

[Free GPT-4]
[DeepSeek]

[HTML] jmir.org

[HTML][HTML] Reinforcement learning for clinical decision support in critical care: comprehensive review

S Liu, KC See, KY Ngiam, LA Celi, X Sun… - Journal of medical Internet …, 2020 - jmir.org

Background Decision support systems based on reinforcement learning (RL) have been
implemented to facilitate the delivery of personalized care. This paper aimed to provide a …

Сохранить Цитировать Цитируется: 223 Похожие статьи Все версии статьи (11) Сохраненная копия

[КНИГА][B] Control systems and reinforcement learning

S Meyn - 2022 - books.google.com

A high school student can create deep Q-learning code to control her robot, without any
understanding of the meaning of'deep'or'Q', or why the code sometimes fails. This book is …

Сохранить Цитировать Цитируется: 158 Похожие статьи Все версии статьи (3) Поиск в библиотеках

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

M Hong, HT Wai, Z Wang, Z Yang - SIAM Journal on Optimization, 2023 - SIAM

This paper analyzes a two-timescale stochastic algorithm framework for bilevel optimization.
Bilevel optimization is a class of problems which exhibits a two-level structure, and its goal is …

Сохранить Цитировать Цитируется: 319 Похожие статьи Все версии статьи (5)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Sample-optimal parametric q-learning using linearly additive features

L Yang, M Wang - International conference on machine …, 2019 - proceedings.mlr.press

Consider a Markov decision process (MDP) that admits a set of state-action features, which
can linearly express the process's probabilistic transition model. We propose a parametric Q …

Сохранить Цитировать Цитируется: 367 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Online robust reinforcement learning with model uncertainty

Y Wang, S Zou - Advances in Neural Information Processing …, 2021 - proceedings.neurips.cc

Robust reinforcement learning (RL) is to find a policy that optimizes the worst-case
performance over an uncertainty set of MDPs. In this paper, we focus on model-free robust …

Сохранить Цитировать Цитируется: 114 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Deterministic policy gradient algorithms

D Silver, G Lever, N Heess, T Degris… - International …, 2014 - proceedings.mlr.press

In this paper we consider deterministic policy gradient algorithms for reinforcement learning
with continuous actions. The deterministic policy gradient has a particularly appealing form …

Сохранить Цитировать Цитируется: 5693 Похожие статьи Все версии статьи (32) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] tamu.edu

[PDF][PDF] Playing atari with deep reinforcement learning

V Mnih - arxiv preprint arxiv:1312.5602, 2013 - people.engr.tamu.edu

We present the first deep learning model to successfully learn control policies directly from
high-dimensional sensory input using reinforcement learning. The model is a convolutional …

Сохранить Цитировать Цитируется: 16981 Похожие статьи Сохраненная копия

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Deep exploration via randomized value functions

I Osband, B Van Roy, DJ Russo, Z Wen - Journal of Machine Learning …, 2019 - jmlr.org

We study the use of randomized value functions to guide deep exploration in reinforcement
learning. This offers an elegant means for synthesizing statistically and computationally …

Сохранить Цитировать Цитируется: 360 Похожие статьи Все версии статьи (9) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

SBEED: Convergent reinforcement learning with nonlinear function approximation

B Dai, A Shaw, L Li, L **ao, N He… - International …, 2018 - proceedings.mlr.press

When function approximation is used, solving the Bellman optimality equation with stability
guarantees has remained a major open problem in reinforcement learning for decades. The …

Сохранить Цитировать Цитируется: 321 Похожие статьи Все версии статьи (8) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Toward off-policy learning control with function approximation.

Q-learning: Theory and applications

[HTML][HTML] Reinforcement learning for clinical decision support in critical care: comprehensive review

[КНИГА][B] Control systems and reinforcement learning

A two-timescale stochastic algorithm framework for bilevel optimization: Complexity analysis and application to actor-critic

Sample-optimal parametric q-learning using linearly additive features

Online robust reinforcement learning with model uncertainty

Deterministic policy gradient algorithms

[PDF][PDF] Playing atari with deep reinforcement learning

Deep exploration via randomized value functions

SBEED: Convergent reinforcement learning with nonlinear function approximation