Multi-agent reinforcement learning: A selective overview of theories and algorithms

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer
Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Near-optimal learning of extensive-form games with imperfect information

Y Bai, C **, S Mei, T Yu - International Conference on …, 2022 - proceedings.mlr.press
This paper resolves the open question of designing near-optimal algorithms for learning
imperfect-information extensive-form games from bandit feedback. We present the first line …

Efficient deviation types and learning for hindsight rationality in extensive-form games

D Morrill, R D'Orazio, M Lanctot… - International …, 2021 - proceedings.mlr.press
Hindsight rationality is an approach to playing general-sum games that prescribes no-regret
learning dynamics for individual agents with respect to a set of deviations, and further …

Double neural counterfactual regret minimization

H Li, K Hu, Z Ge, T Jiang, Y Qi, L Song - arxiv preprint arxiv:1812.10607, 2018 - arxiv.org
Counterfactual Regret Minimization (CRF) is a fundamental and effective technique for
solving Imperfect Information Games (IIG). However, the original CRF algorithm only works …

Escher: Eschewing importance sampling in games by computing a history value function to estimate regret

S McAleer, G Farina, M Lanctot, T Sandholm - arxiv preprint arxiv …, 2022 - arxiv.org
Recent techniques for approximating Nash equilibria in very large games leverage neural
networks to learn approximately optimal policies (strategies). One promising line of research …

Single deep counterfactual regret minimization

E Steinberger - arxiv preprint arxiv:1901.07621, 2019 - arxiv.org
Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding
approximate Nash equilibria in imperfect information games. However, CFR's reliance on …

Time and space: Why imperfect information games are hard

N Burch - 2018 - era.library.ualberta.ca
Decision-making problems with two agents can be modeled as two player games, and a
Nash equilibrium is the basic solution concept describing good play in adversarial games …

Steering language models with game-theoretic solvers

I Gemp, R Patel, Y Bachrach, M Lanctot… - … Markets Workshop at …, 2024 - openreview.net
Mathematical models of strategic interactions among rational agents have long been studied
in game theory. However the interactions studied are often over a small set of discrete …

The advantage regret-matching actor-critic

A Gruslys, M Lanctot, R Munos, F Timbers… - arxiv preprint arxiv …, 2020 - arxiv.org
Regret minimization has played a key role in online learning, equilibrium computation in
games, and reinforcement learning (RL). In this paper, we describe a general model-free RL …