Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org
Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

Neural policy gradient methods: Global optimality and rates of convergence

L Wang, Q Cai, Z Yang, Z Wang - ar**
V Dewanto, G Dunn, A Eshragh, M Gallagher… - arxiv preprint arxiv …, 2020 - arxiv.org
Reinforcement learning is important part of artificial intelligence. In this paper, we review
model-free reinforcement learning that utilizes the average reward optimality criterion in the …

Distributed learning in the nonconvex world: From batch data to streaming and beyond

TH Chang, M Hong, HT Wai… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
Distributed learning has become a critical enabler of the massively connected world that
many people envision. This article discusses four key elements of scalable distributed …

On the bias-variance-cost tradeoff of stochastic optimization

Y Hu, X Chen, N He - Advances in Neural Information …, 2021 - proceedings.neurips.cc
We consider stochastic optimization when one only has access to biased stochastic oracles
of the objective, and obtaining stochastic gradients with low biases comes at high costs. This …

Biased stochastic first-order methods for conditional stochastic optimization and applications in meta learning

Y Hu, S Zhang, X Chen, N He - Advances in Neural …, 2020 - proceedings.neurips.cc
Conditional stochastic optimization covers a variety of applications ranging from invariant
learning and causal inference to meta-learning. However, constructing unbiased gradient …

Non-asymptotic convergence analysis of two time-scale (natural) actor-critic algorithms

T Xu, Z Wang, Y Liang - arxiv preprint arxiv:2005.03557, 2020 - arxiv.org
As an important type of reinforcement learning algorithms, actor-critic (AC) and natural actor-
critic (NAC) algorithms are often executed in two ways for finding optimal policies. In the first …

Multi-agent performative prediction with greedy deployment and consensus seeking agents

Q Li, CY Yau, HT Wai - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We consider a scenario where multiple agents are learning a common decision vector from
data which can be influenced by the agents' decisions. This leads to the problem of multi …