A lyapunov-based approach to safe reinforcement learning
In many real-world reinforcement learning (RL) problems, besides optimizing the main
objective function, an agent must concurrently avoid violating a number of constraints. In …
objective function, an agent must concurrently avoid violating a number of constraints. In …
Variational policy gradient method for reinforcement learning with general utilities
In recent years, reinforcement learning systems with general goals beyond a cumulative
sum of rewards have gained traction, such as in constrained problems, exploration, and …
sum of rewards have gained traction, such as in constrained problems, exploration, and …
Adaptive trust region policy optimization: Global convergence and faster rates for regularized mdps
Trust region policy optimization (TRPO) is a popular and empirically successful policy
search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts …
search algorithm in Reinforcement Learning (RL) in which a surrogate problem, that restricts …
Global convergence of policy gradient methods to (almost) locally optimal policies
Policy gradient (PG) methods have been one of the most essential ingredients of
reinforcement learning, with application in a variety of domains. In spite of the empirical …
reinforcement learning, with application in a variety of domains. In spite of the empirical …
An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …
(NPG) methods, and their variance-reduced variants, under general smooth policy …
Safe policy improvement with baseline bootstrap**
Abstract This paper considers Safe Policy Improvement (SPI) in Batch Reinforcement
Learning (Batch RL): from a fixed dataset and without direct access to the true environment …
Learning (Batch RL): from a fixed dataset and without direct access to the true environment …
Stochastic variance-reduced policy gradient
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …
OnRL: improving mobile video telephony via online reinforcement learning
Machine learning models, particularly reinforcement learning (RL), have demonstrated great
potential in optimizing video streaming applications. However, the state-of-the-art solutions …
potential in optimizing video streaming applications. However, the state-of-the-art solutions …
Sample efficient policy gradient methods with recursive variance reduction
Improving the sample efficiency in reinforcement learning has been a long-standing
research problem. In this work, we aim to reduce the sample complexity of existing policy …
research problem. In this work, we aim to reduce the sample complexity of existing policy …
An improved convergence analysis of stochastic variance-reduced policy gradient
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed
by\citet {papini2018stochastic} for reinforcement learning. We provide an improved …
by\citet {papini2018stochastic} for reinforcement learning. We provide an improved …