- Academic Search

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Save Cite Cited by 352 Related articles All 2 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] annualreviews.org

Toward a theoretical foundation of policy optimization for learning control policies

B Hu, K Zhang, N Li, M Mesbahi… - Annual Review of …, 2023 - annualreviews.org

Gradient-based methods have been widely used for system design and optimization in
diverse application domains. Recently, there has been a renewed interest in studying …

Save Cite Cited by 87 Related articles All 6 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

C **, Q Liu, S Miryoosefi - Advances in neural information …, 2021 - proceedings.neurips.cc

Finding the minimal structural assumptions that empower sample-efficient learning is one of
the most important research directions in Reinforcement Learning (RL). This paper …

Save Cite Cited by 266 Related articles All 11 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

A Agarwal, SM Kakade, JD Lee, G Mahajan - Journal of Machine Learning …, 2021 - jmlr.org

Policy gradient methods are among the most effective methods in challenging reinforcement
learning problems with large state and/or action spaces. However, little is known about even …

Save Cite Cited by 502 Related articles All 13 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimality and approximation with policy gradient methods in markov decision processes

A Agarwal, SM Kakade, JD Lee… - … on Learning Theory, 2020 - proceedings.mlr.press

Policy gradient (PG) methods are among the most effective methods in challenging
reinforcement learning problems with large state and/or action spaces. However, little is …

Save Cite Cited by 411 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the sample complexity of the linear quadratic regulator

S Dean, H Mania, N Matni, B Recht, S Tu - Foundations of Computational …, 2020 - Springer

This paper addresses the optimal control problem known as the linear quadratic regulator in
the case when the dynamics are unknown. We propose a multistage procedure, called …

Save Cite Cited by 657 Related articles All 12 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Natural policy gradient primal-dual method for constrained markov decision processes

D Ding, K Zhang, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

We study sequential decision-making problems in which each agent aims to maximize the
expected total reward while satisfying a constraint on the expected total utility. We employ …

Save Cite Cited by 221 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably efficient exploration in policy optimization

Q Cai, Z Yang, C **, Z Wang - International Conference on …, 2020 - proceedings.mlr.press

While policy-based reinforcement learning (RL) achieves tremendous successes in practice,
it is significantly less understood in theory, especially compared with value-based RL. In …

Save Cite Cited by 327 Related articles All 9 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] informs.org

Fast global convergence of natural policy gradient methods with entropy regularization

S Cen, C Cheng, Y Chen, Y Wei… - Operations …, 2022 - pubsonline.informs.org

Natural policy gradient (NPG) methods are among the most widely used policy optimization
algorithms in contemporary reinforcement learning. This class of methods is often applied in …

Save Cite Cited by 233 Related articles All 15 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Solving a class of non-convex min-max games using iterative first order methods

M Nouiehed, M Sanjabi, T Huang… - Advances in …, 2019 - proceedings.neurips.cc

Recent applications that arise in machine learning have surged significant interest in solving
min-max saddle point games. This problem has been extensively studied in the convex …

Save Cite Cited by 395 Related articles All 10 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

Global convergence of policy gradient methods for the linear quadratic regulator

An overview of multi-agent reinforcement learning from game theoretical perspective

Toward a theoretical foundation of policy optimization for learning control policies

Bellman eluder dimension: New rich classes of rl problems, and sample-efficient algorithms

On the theory of policy gradient methods: Optimality, approximation, and distribution shift

Optimality and approximation with policy gradient methods in markov decision processes

On the sample complexity of the linear quadratic regulator

Natural policy gradient primal-dual method for constrained markov decision processes

Provably efficient exploration in policy optimization

Fast global convergence of natural policy gradient methods with entropy regularization

Solving a class of non-convex min-max games using iterative first order methods