- Academic Search

Y Chow, O Nachum… - Advances in neural …, 2018 - proceedings.neurips.cc

In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …

Zapisz Cytuj Cytowane przez 632 Powiązane artykuły Wszystkie wersje 12 Wersja HTML

[Free GPT-4]

[PDF] neurips.cc

Variational policy gradient method for reinforcement learning with general utilities

J Zhang, A Koppel, AS Bedi… - Advances in Neural …, 2020 - proceedings.neurips.cc

In recent years, reinforcement learning systems with general goals beyond a cumulative
sum of rewards have gained traction, such as in constrained problems, exploration, and …

Zapisz Cytuj Cytowane przez 165 Powiązane artykuły Wszystkie wersje 9 Wersja HTML

[Free GPT-4]

[PDF] aaai.org

Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps

L Shani, Y Efroni, S Mannor - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org

Trust region policy optimization (TRPO) is a popular and empirically successful policy
search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts …

Zapisz Cytuj Cytowane przez 198 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

[Free GPT-4]

[PDF] nsf.gov

Global convergence of policy gradient methods to (almost) locally optimal policies

K Zhang, A Koppel, H Zhu, T Basar - SIAM Journal on Control and …, 2020 - SIAM

Policy gradient (PG) methods have been one of the most essential ingredients of
reinforcement learning, with application in a variety of domains. In spite of the empirical …

Zapisz Cytuj Cytowane przez 228 Powiązane artykuły Wszystkie wersje 10

[Free GPT-4]

[PDF] neurips.cc

An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

Y Liu, K Zhang, T Basar, W Yin - Advances in Neural …, 2020 - proceedings.neurips.cc

In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …

Zapisz Cytuj Cytowane przez 123 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Safe policy improvement with baseline bootstrap**

R Laroche, P Trichelair… - … conference on machine …, 2019 - proceedings.mlr.press

Abstract This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement
Learning (Batch RL): from a fixed dataset and without direct access to the true environment …

Zapisz Cytuj Cytowane przez 246 Powiązane artykuły Wszystkie wersje 8 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

Stochastic variance-reduced policy gradient

M Papini, D Binaghi, G Canonaco… - International …, 2018 - proceedings.mlr.press

In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …

Zapisz Cytuj Cytowane przez 216 Powiązane artykuły Wszystkie wersje 11 Wersja HTML

[Free GPT-4]

[PDF] aminer.cn

OnRL: improving mobile video telephony via online reinforcement learning

H Zhang, A Zhou, J Lu, R Ma, Y Hu, C Li… - Proceedings of the 26th …, 2020 - dl.acm.org

Machine learning models, particularly reinforcement learning (RL), have demonstrated great
potential in optimizing video streaming applications. However, the state-of-the-art solutions …

Zapisz Cytuj Cytowane przez 115 Powiązane artykuły Wszystkie wersje 2

[Free GPT-4]

[PDF] arxiv.org

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arxiv preprint arxiv:1909.08610, 2019 - arxiv.org

Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

Zapisz Cytuj Cytowane przez 111 Powiązane artykuły Wszystkie wersje 7 Wersja HTML

[Free GPT-4]

[PDF] mlr.press

An improved convergence analysis of stochastic variance-reduced policy gradient

P Xu, F Gao, Q Gu - Uncertainty in Artificial Intelligence, 2020 - proceedings.mlr.press

We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed
by\citet {papini2018stochastic} for reinforcement learning. We provide an improved …

Zapisz Cytuj Cytowane przez 122 Powiązane artykuły Wszystkie wersje 6 Wersja HTML

Utwórz alert

Cytuj

Szukanie zaawansowane

Zapisano w Mojej bibliotece

Adaptive step-size for policy gradient methods

A lyapunov-based approach to safe reinforcement learning

Variational policy gradient method for reinforcement learning with general utilities

Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps

Global convergence of policy gradient methods to (almost) locally optimal policies

An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

Safe policy improvement with baseline bootstrap**

Stochastic variance-reduced policy gradient

OnRL: improving mobile video telephony via online reinforcement learning

Sample efficient policy gradient methods with recursive variance reduction

An improved convergence analysis of stochastic variance-reduced policy gradient