- Academic Search

R Yuan, RM Gower, A Lazaric - International Conference on …, 2022 - proceedings.mlr.press

We adapt recent tools developed for the analysis of Stochastic Gradient Descent (SGD) in
non-convex optimization to obtain convergence and sample complexity guarantees for the …

Speichern Zitieren Zitiert von: 69 Ähnliche Artikel Alle 10 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Speichern Zitieren Zitiert von: 21 Ähnliche Artikel Alle 8 Versionen Bibliothekssuche HTML-Version

[Free GPT-4]

[PDF] mlr.press

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press

Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

Speichern Zitieren Zitiert von: 32 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] neurips.cc

Enhanced bilevel optimization via bregman distance

F Huang, J Li, S Gao, H Huang - Advances in Neural …, 2022 - proceedings.neurips.cc

Bilevel optimization has been recently used in many machine learning problems such as
hyperparameter optimization, policy optimization, and meta learning. Although many bilevel …

Speichern Zitieren Zitiert von: 26 Ähnliche Artikel Alle 9 Versionen HTML-Version

Improving proximal policy optimization with alpha divergence

H Xu, Z Yan, J Xuan, G Zhang, J Lu - Neurocomputing, 2023 - Elsevier

Proximal policy optimization (PPO) is a recent advancement in reinforcement learning,
which is formulated as an unconstrained optimization problem including two terms …

Speichern Zitieren Zitiert von: 9 Ähnliche Artikel Alle 3 Versionen

[Free GPT-4]

[PDF] thecvf.com

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

A Ganjdanesh, S Gao, H Huang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Structural model pruning is a prominent approach used for reducing the computational cost
of Convolutional Neural Networks (CNNs) before their deployment on resource-constrained …

Speichern Zitieren Zitiert von: 4 Ähnliche Artikel Alle 3 Versionen HTML-Version

[Free GPT-4]

[PDF] aaai.org

Policy optimization with stochastic mirror descent

L Yang, Y Zhang, G Zheng, Q Zheng, P Li… - Proceedings of the …, 2022 - ojs.aaai.org

Improving sample efficiency has been a longstanding goal in reinforcement learning. This
paper proposes VRMPO algorithm: a sample efficient policy gradient method with stochastic …

Speichern Zitieren Zitiert von: 39 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] springer.com

Geometry and convergence of natural policy gradient methods

J Müller, G Montúfar - Information Geometry, 2024 - Springer

We study the convergence of several natural policy gradient (NPG) methods in infinite-
horizon discounted Markov decision processes with regular policy parametrizations. For a …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 9 Versionen

[Free GPT-4]

[PDF] mlr.press

Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence

I Fatkhullin, N He - International Conference on Artificial …, 2024 - proceedings.mlr.press

This paper revisits the convergence of Stochastic Mirror Descent (SMD) in the contemporary
nonconvex optimization setting. Existing results for batch-free nonconvex SMD restrict the …

Speichern Zitieren Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods

S Klein, S Weissmann, L Döring - arxiv preprint arxiv:2310.02671, 2023 - arxiv.org

Markov Decision Processes (MDPs) are a formal framework for modeling and solving
sequential decision-making problems. In finite-time horizons such problems are relevant for …

Speichern Zitieren Zitiert von: 8 Ähnliche Artikel Alle 4 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Bregman gradient policy optimization

A general sample complexity analysis of vanilla policy gradient

A novel framework for policy mirror descent with general parameterization and linear convergence

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Enhanced bilevel optimization via bregman distance

Improving proximal policy optimization with alpha divergence

Jointly Training and Pruning CNNs via Learnable Agent Guidance and Alignment

Policy optimization with stochastic mirror descent

Geometry and convergence of natural policy gradient methods

Taming Nonconvex Stochastic Mirror Descent with General Bregman Divergence

Beyond Stationarity: Convergence Analysis of Stochastic Softmax Policy Gradient Methods