A general sample complexity analysis of vanilla policy gradient

R Yuan, RM Gower, A Lazaric - International Conference on …, 2022 - proceedings.mlr.press
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

Enhanced bilevel optimization via bregman distance

F Huang, J Li, S Gao, H Huang - Advances in Neural …, 2022 - proceedings.neurips.cc
Bilevel optimization has been recently used in many machine learning problems such as
hyperparameter optimization, policy optimization, and meta learning. Although many bilevel …

Improving proximal policy optimization with alpha divergence

H Xu, Z Yan, J Xuan, G Zhang, J Lu - Neurocomputing, 2023 - Elsevier
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning,
which is formulated as an unconstrained optimization problem including two terms …

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

A Ganjdanesh, S Gao, H Huang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Structural model pruning is a prominent approach used for reducing the computational cost
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …

Policy optimization with stochastic mirror descent

L Yang, Y Zhang, G Zheng, Q Zheng, P Li… - Proceedings of the …, 2022 - ojs.aaai.org
Improving sample efficiency has been a longstanding goal in reinforcement learning. This
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …

Geometry and convergence of natural policy gradient methods

J Müller, G Montúfar - Information Geometry, 2024 - Springer
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …

Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence

I Fatkhullin, N He - International Conference on Artificial …, 2024 - proceedings.mlr.press
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary
nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the …

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

S Klein, S Weissmann, L Döring - arxiv preprint arxiv:2310.02671, 2023 - arxiv.org
Markov Decision Processes (MDPs) are a formal framework for modeling and solving
sequential decision-making problems. In finite-time horizons such problems are relevant for …