An improved analysis of (variance-reduced) policy gradient and natural policy gradient methods
In this paper, we revisit and improve the convergence of policy gradient (PG), natural PG
(NPG) methods, and their variance-reduced variants, under general smooth policy …
(NPG) methods, and their variance-reduced variants, under general smooth policy …
Sample efficient reinforcement learning with REINFORCE
Policy gradient methods are among the most effective methods for large-scale reinforcement
learning, and their empirical success has prompted several works that develop the …
learning, and their empirical success has prompted several works that develop the …
[PDF][PDF] Improved sample complexity analysis of natural policy gradient algorithm with general parameterization for infinite horizon discounted reward markov decision …
We consider the problem of designing sample efficient learning algorithms for infinite
horizon discounted reward Markov Decision Process. Specifically, we propose the …
horizon discounted reward Markov Decision Process. Specifically, we propose the …
Momentum-based policy gradient methods
In the paper, we propose a class of efficient momentum-based policy gradient methods for
the model-free reinforcement learning, which use adaptive learning rates and do not require …
the model-free reinforcement learning, which use adaptive learning rates and do not require …
On the hidden biases of policy mirror ascent in continuous action spaces
We focus on parameterized policy search for reinforcement learning over continuous action
spaces. Typically, one assumes the score function associated with a policy is bounded …
spaces. Typically, one assumes the score function associated with a policy is bounded …
PAGE-PG: A simple and loopless variance-reduced policy gradient method with probabilistic gradient estimation
Despite their success, policy gradient methods suffer from high variance of the gradient
estimator, which can result in unsatisfactory sample complexity. Recently, numerous …
estimator, which can result in unsatisfactory sample complexity. Recently, numerous …
Efficient privacy-preserving stochastic nonconvex optimization
While many solutions for privacy-preserving convex empirical risk minimization (ERM) have
been developed, privacy-preserving nonconvex ERM remains a challenge. We study …
been developed, privacy-preserving nonconvex ERM remains a challenge. We study …
Smoothing policies and safe policy gradients
Policy gradient (PG) algorithms are among the best candidates for the much-anticipated
applications of reinforcement learning to real-world control tasks, such as robotics. However …
applications of reinforcement learning to real-world control tasks, such as robotics. However …
Adaptive stochastic ADMM for decentralized reinforcement learning in edge IoT
Edge computing provides a promising paradigm to support the implementation of Internet of
Things (IoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network …
Things (IoT) by offloading tasks to nearby edge nodes. Meanwhile, the increasing network …
Dealing with sparse rewards in continuous control robotics via heavy-tailed policies
In this paper, we present a novel Heavy-Tailed Stochastic Policy Gradient (HT-PSG)
algorithm to deal with the challenges of sparse rewards in continuous control problems …
algorithm to deal with the challenges of sparse rewards in continuous control problems …