- Academic Search

C Lu, J Kuba, A Letcher, L Metz… - Advances in …, 2022 - proceedings.neurips.cc

Tremendous progress has been made in reinforcement learning (RL) over the past decade.
Most of these advancements came through the continual development of new algorithms …

保存引用被引用次数：82 相关文章所有 6 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arxiv preprint arxiv …, 2022 - arxiv.org

We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

保存引用被引用次数：42 相关文章所有 7 个版本 HTML 版

[Free GPT-4]

[PDF] ssrn.com

Meta-Black-Box optimization for evolutionary algorithms: Review and perspective

X Yang, R Wang, K Li, H Ishibuchi - Swarm and Evolutionary Computation, 2025 - Elsevier

Abstract Black-Box Optimization (BBO) is increasingly vital for addressing complex real-
world optimization challenges, where traditional methods fall short due to their reliance on …

保存引用被引用次数：1 相关文章所有 2 个版本

[Free GPT-4]

[PDF] arxiv.org

Reinforcement Learning: An Overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org

This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

保存引用被引用次数：1 相关文章 HTML 版

[Free GPT-4]

[PDF] neurips.cc

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc

Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

保存引用被引用次数：20 相关文章所有 8 个版本图书馆搜索 HTML 版

[Free GPT-4]

[PDF] jmlr.org

[PDF][PDF] Heterogeneous-agent reinforcement learning

Y Zhong, JG Kuba, X Feng, S Hu, J Ji, Y Yang - Journal of Machine …, 2024 - jmlr.org

The necessity for cooperation among intelligent machines has popularised cooperative multi-
agent reinforcement learning (MARL) in AI research. However, many research endeavours …

保存引用被引用次数：42 相关文章所有 3 个版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Proximal learning with opponent-learning awareness

S Zhao, C Lu, RB Grosse… - Advances in Neural …, 2022 - proceedings.neurips.cc

Abstract Learning With Opponent-Learning Awareness (LOLA)(Foerster et al.[2018a]) is a
multi-agent reinforcement learning algorithm that typically learns reciprocity-based …

保存引用被引用次数：23 相关文章所有 5 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Discovering temporally-aware reinforcement learning algorithms

MT Jackson, C Lu, L Kirsch, RT Lange… - arxiv preprint arxiv …, 2024 - arxiv.org

Recent advancements in meta-learning have enabled the automatic discovery of novel
reinforcement learning algorithms parameterized by surrogate objective functions. To …

保存引用被引用次数：16 相关文章所有 4 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl

JG Kuba, X Feng, S Ding, H Dong, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org

The necessity for cooperation among intelligent machines has popularised cooperative multi-
agent reinforcement learning (MARL) in the artificial intelligence (AI) research community …

保存引用被引用次数：21 相关文章 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Mutual-Information Regularized Multi-Agent Policy Iteration

D Ye, Z Lu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

Despite the success of cooperative multi-agent reinforcement learning algorithms, most of
them focus on a single team composition, which prevents them from being used in more …

保存引用相关文章 HTML 版

引用

高级搜索

已保存到“我的图书馆”

Discovered policy optimisation

Linear convergence of natural policy gradient methods with log-linear policies

Meta-Black-Box optimization for evolutionary algorithms: Review and perspective

Reinforcement Learning: An Overview

A novel framework for policy mirror descent with general parameterization and linear convergence

[PDF][PDF] Heterogeneous-agent reinforcement learning

Proximal learning with opponent-learning awareness

Discovering temporally-aware reinforcement learning algorithms

Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl

Mutual-Information Regularized Multi-Agent Policy Iteration