Discovered policy optimisation

C Lu, J Kuba, A Letcher, L Metz… - Advances in …, 2022 - proceedings.neurips.cc
Tremendous progress has been made in reinforcement learning (RL) over the past decade.
Most of these advancements came through the continual development of new algorithms …

Linear convergence of natural policy gradient methods with log-linear policies

R Yuan, SS Du, RM Gower, A Lazaric… - arxiv preprint arxiv …, 2022 - arxiv.org
We consider infinite-horizon discounted Markov decision processes and study the
convergence rates of the natural policy gradient (NPG) and the Q-NPG methods with the log …

Meta-Black-Box optimization for evolutionary algorithms: Review and perspective

X Yang, R Wang, K Li, H Ishibuchi - Swarm and Evolutionary Computation, 2025 - Elsevier
Abstract Black-Box Optimization (BBO) is increasingly vital for addressing complex real-
world optimization challenges, where traditional methods fall short due to their reliance on …

Reinforcement Learning: An Overview

K Murphy - arxiv preprint arxiv:2412.05265, 2024 - arxiv.org
This manuscript gives a big-picture, up-to-date overview of the field of (deep) reinforcement
learning and sequential decision making, covering value-based RL, policy-gradient …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

[PDF][PDF] Heterogeneous-agent reinforcement learning

Y Zhong, JG Kuba, X Feng, S Hu, J Ji, Y Yang - Journal of Machine …, 2024 - jmlr.org
The necessity for cooperation among intelligent machines has popularised cooperative multi-
agent reinforcement learning (MARL) in AI research. However, many research endeavours …

Proximal learning with opponent-learning awareness

S Zhao, C Lu, RB Grosse… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract Learning With Opponent-Learning Awareness (LOLA)(Foerster et al.[2018a]) is a
multi-agent reinforcement learning algorithm that typically learns reciprocity-based …

Discovering temporally-aware reinforcement learning algorithms

MT Jackson, C Lu, L Kirsch, RT Lange… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent advancements in meta-learning have enabled the automatic discovery of novel
reinforcement learning algorithms parameterized by surrogate objective functions. To …

Heterogeneous-agent mirror learning: A continuum of solutions to cooperative marl

JG Kuba, X Feng, S Ding, H Dong, J Wang… - arxiv preprint arxiv …, 2022 - arxiv.org
The necessity for cooperation among intelligent machines has popularised cooperative multi-
agent reinforcement learning (MARL) in the artificial intelligence (AI) research community …

Mutual-Information Regularized Multi-Agent Policy Iteration

D Ye, Z Lu - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc
Despite the success of cooperative multi-agent reinforcement learning algorithms, most of
them focus on a single team composition, which prevents them from being used in more …