A general sample complexity analysis of vanilla policy gradient
We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …
non-convex optimization to obtain convergence and sample complexity guarantees for the …
A novel framework for policy mirror descent with general parameterization and linear convergence
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …
their success to the use of parameterized policies. However, while theoretical guarantees …
Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …
learn the optimal joint control policy. However, existing decentralized AC algorithms either …
Enhanced bilevel optimization via bregman distance
Bilevel optimization has been recently used in many machine learning problems such as
hyperparameter optimization, policy optimization, and meta learning. Although many bilevel …
hyperparameter optimization, policy optimization, and meta learning. Although many bilevel …
Improving proximal policy optimization with alpha divergence
Proximal policy optimization (PPO) is a recent advancement in reinforcement learning,
which is formulated as an unconstrained optimization problem including two terms …
which is formulated as an unconstrained optimization problem including two terms …
Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment
Structural model pruning is a prominent approach used for reducing the computational cost
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …
Policy optimization with stochastic mirror descent
Improving sample efficiency has been a longstanding goal in reinforcement learning. This
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …
Geometry and convergence of natural policy gradient methods
We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …
horizon discounted Markov decision processes with regular policy parametrizations. For a …
Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence
This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary
nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the …
nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the …
Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods
Markov Decision Processes (MDPs) are a formal framework for modeling and solving
sequential decision-making problems. In finite-time horizons such problems are relevant for …
sequential decision-making problems. In finite-time horizons such problems are relevant for …