A lyapunov-based approach to safe reinforcement learning

Y Chow, O Nachum… - Advances in neural …, 2018 - proceedings.neurips.cc
In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …

Variational policy gradient method for reinforcement learning with general utilities

J Zhang, A Koppel, AS Bedi… - Advances in Neural …, 2020 - proceedings.neurips.cc
In recent years, reinforcement learning systems with general goals beyond a cumulative
sum of rewards have gained traction, such as in constrained problems, exploration, and …

Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps

L Shani, Y Efroni, S Mannor - Proceedings of the AAAI Conference on …, 2020 - ojs.aaai.org
Trust region policy optimization (TRPO) is a popular and empirically successful policy
search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts …

Global convergence of policy gradient methods to (almost) locally optimal policies

K Zhang, A Koppel, H Zhu, T Basar - SIAM Journal on Control and …, 2020 - SIAM
Policy gradient (PG) methods have been one of the most essential ingredients of
reinforcement learning, with application in a variety of domains. In spite of the empirical …

An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

Y Liu, K Zhang, T Basar, W Yin - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …

Safe policy improvement with baseline bootstrap**

R Laroche, P Trichelair… - … conference on machine …, 2019 - proceedings.mlr.press
Abstract This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement
Learning (Batch RL): from a fixed dataset and without direct access to the true environment …

Stochastic variance-reduced policy gradient

M Papini, D Binaghi, G Canonaco… - International …, 2018 - proceedings.mlr.press
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …

OnRL: improving mobile video telephony via online reinforcement learning

H Zhang, A Zhou, J Lu, R Ma, Y Hu, C Li… - Proceedings of the 26th …, 2020 - dl.acm.org
Machine learning models, particularly reinforcement learning (RL), have demonstrated great
potential in optimizing video streaming applications. However, the state-of-the-art solutions …

Sample efficient policy gradient methods with recursive variance reduction

P Xu, F Gao, Q Gu - arxiv preprint arxiv:1909.08610, 2019 - arxiv.org
Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …

An improved convergence analysis of stochastic variance-reduced policy gradient

P Xu, F Gao, Q Gu - Uncertainty in Artificial Intelligence, 2020 - proceedings.mlr.press
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed
by\citet {papini2018stochastic} for reinforcement learning. We provide an improved …