An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Combining deep reinforcement learning and search for imperfect-information games

N Brown, A Bakhtin, A Lerer… - Advances in Neural …, 2020 - proceedings.neurips.cc
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …

Student of Games: A unified learning algorithm for both perfect and imperfect information games

M Schmid, M Moravčík, N Burch, R Kadlec… - Science …, 2023 - science.org
Games have a long history as benchmarks for progress in artificial intelligence. Approaches
using search and learning produced strong performance across many perfect information …

Causal multi-agent reinforcement learning: Review and open problems

SJ Grimbly, J Shock, A Pretorius - arxiv preprint arxiv:2111.06721, 2021 - arxiv.org
This paper serves to introduce the reader to the field of multi-agent reinforcement learning
(MARL) and its intersection with methods from the study of causality. We highlight key …

Improving policies via search in cooperative partially observable games

A Lerer, H Hu, J Foerster, N Brown - … of the AAAI conference on artificial …, 2020 - ojs.aaai.org
Recent superhuman results in games have largely been achieved in a variety of zero-sum
settings, such as Go and Poker, in which agents need to compete against others. However …

Near-optimal learning of extensive-form games with imperfect information

Y Bai, C **, S Mei, T Yu - International Conference on …, 2022 - proceedings.mlr.press
This paper resolves the open question of designing near-optimal algorithms for learning
imperfect-information extensive-form games from bandit feedback. We present the first line …

Dream: Deep regret minimization with advantage baselines and model-free learning

E Steinberger, A Lerer, N Brown - arxiv preprint arxiv:2006.10410, 2020 - arxiv.org
We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies
in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash …

Escher: Eschewing importance sampling in games by computing a history value function to estimate regret

S McAleer, G Farina, M Lanctot, T Sandholm - arxiv preprint arxiv …, 2022 - arxiv.org
Recent techniques for approximating Nash equilibria in very large games leverage neural
networks to learn approximately optimal policies (strategies). One promising line of research …

Honeypot allocation for cyber deception under uncertainty

AH Anwar, CA Kamhoua, NO Leslie… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Cyber deception aims to misrepresent the state of the network to mislead the attackers,
falsify their reconnaissance conclusions, and deflect them away from their goals. Honeypots …

HSVI can solve zero-sum partially observable stochastic games

A Delage, O Buffet, JS Dibangoye… - Dynamic Games and …, 2024 - Springer
State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on
linear programming or regret minimization, though not on dynamic programming (DP) or …