Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team … - Science, 2022 - science.org
Despite much progress in training artificial intelligence (AI) systems to imitate human
language, building agents that use language to communicate intentionally with humans in …

Solving imperfect-information games via discounted regret minimization

N Brown, T Sandholm - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org
Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most
popular and, in practice, fastest approach to approximately solving large …

Safe and nested subgame solving for imperfect-information games

N Brown, T Sandholm - Advances in neural information …, 2017 - proceedings.neurips.cc
In imperfect-information games, the optimal strategy in a subgame may depend on the
strategy in other, unreached subgames. Thus a subgame cannot be solved in isolation and …

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc
Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning

A Bakhtin, DJ Wu, A Lerer, J Gray, AP Jacob… - arxiv preprint arxiv …, 2022 - arxiv.org
No-press Diplomacy is a complex strategy game involving both cooperation and competition
that has served as a benchmark for multi-agent AI research. While self-play reinforcement …

XDO: A double oracle algorithm for extensive-form games

S McAleer, JB Lanier, KA Wang… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm
for two-player zero-sum games that has been empirically shown to find approximate Nash …

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

G Farina, C Kroer, T Sandholm - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …

Computing approximate equilibria in sequential adversarial games by exploitability descent

E Lockhart, M Lanctot, J Pérolat, JB Lespiau… - arxiv preprint arxiv …, 2019 - arxiv.org
In this paper, we present exploitability descent, a new algorithm to compute approximate
equilibria in two-player zero-sum extensive-form games with imperfect information, by direct …