An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods

Y Liu, K Zhang, T Basar, W Yin - Advances in Neural …, 2020‏ - proceedings.neurips.cc
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …

Sample efficient reinforcement learning with REINFORCE

J Zhang, J Kim, B O'Donoghue, S Boyd - Proceedings of the AAAI …, 2021‏ - ojs.aaai.org
Policy gradient methods are among the most effective methods for large-scale reinforcement
learning, and their empirical success has prompted several works that develop the …

[PDF][PDF] Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …

WU Mondal, V Aggarwal - International Conference on …, 2024‏ - proceedings.mlr.press
We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …

Momentum-based policy gradient methods

F Huang, S Gao, J Pei, H Huang - … conference on machine …, 2020‏ - proceedings.mlr.press
In the paper, we propose a class of efficient momentum-based policy gradient methods for
the model-free reinforcement learning, which use adaptive learning rates and do not require …

On the hidden biases of policy mirror ascent in continuous action spaces

AS Bedi, S Chakraborty, A Parayil… - International …, 2022‏ - proceedings.mlr.press
We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …

PAGE-PG: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation

M Gargiani, A Zanelli, A Martinelli… - International …, 2022‏ - proceedings.mlr.press
Despite their success, policy gradient methods suffer from high variance of the gradient
estimator, which can result in unsatisfactory sample complexity. Recently, numerous …

Efficient privacy-preserving stochastic nonconvex optimization

L Wang, B Jayaraman, D Evans… - Uncertainty in Artificial …, 2023‏ - proceedings.mlr.press
While many solutions for privacy-preserving convex empirical risk minimization (ERM) have
been developed, privacy-preserving nonconvex ERM remains a challenge. We study …

Smoothing policies and safe policy gradients

M Papini, M Pirotta, M Restelli - Machine Learning, 2022‏ - Springer
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …

Adaptive stochastic ADMM for decentralized reinforcement learning in edge IoT

W Lei, Y Ye, M **ao, M Skoglund… - IEEE Internet of Things …, 2022‏ - ieeexplore.ieee.org
Edge computing provides a promising paradigm to support the implementation of Internet of
Things (IoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network …

Dealing with sparse rewards in continuous control robotics via heavy-tailed policies

S Chakraborty, AS Bedi, A Koppel, P Tokekar… - arxiv preprint arxiv …, 2022‏ - arxiv.org
In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG)
algorithm to deal with the challenges of sparse rewards in continuous control problems …