An overview of multi-agent reinforcement learning from game theoretical perspective
Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org
Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …
Combining deep reinforcement learning and search for imperfect-information games
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …
a powerful paradigm that has led to a number of successes in single-agent settings and …
Student of Games: A unified learning algorithm for both perfect and imperfect information games
Games have a long history as benchmarks for progress in artificial intelligence. Approaches
using search and learning produced strong performance across many perfect information …
using search and learning produced strong performance across many perfect information …
Causal multi-agent reinforcement learning: Review and open problems
This paper serves to introduce the reader to the field of multi-agent reinforcement learning
(MARL) and its intersection with methods from the study of causality. We highlight key …
(MARL) and its intersection with methods from the study of causality. We highlight key …
Improving policies via search in cooperative partially observable games
Recent superhuman results in games have largely been achieved in a variety of zero-sum
settings, such as Go and Poker, in which agents need to compete against others. However …
settings, such as Go and Poker, in which agents need to compete against others. However …
Near-optimal learning of extensive-form games with imperfect information
This paper resolves the open question of designing near-optimal algorithms for learning
imperfect-information extensive-form games from bandit feedback. We present the first line …
imperfect-information extensive-form games from bandit feedback. We present the first line …
Dream: Deep regret minimization with advantage baselines and model-free learning
We introduce DREAM, a deep reinforcement learning algorithm that finds optimal strategies
in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash …
in imperfect-information games with multiple agents. Formally, DREAM converges to a Nash …
Escher: Eschewing importance sampling in games by computing a history value function to estimate regret
Recent techniques for approximating Nash equilibria in very large games leverage neural
networks to learn approximately optimal policies (strategies). One promising line of research …
networks to learn approximately optimal policies (strategies). One promising line of research …
Honeypot allocation for cyber deception under uncertainty
Cyber deception aims to misrepresent the state of the network to mislead the attackers,
falsify their reconnaissance conclusions, and deflect them away from their goals. Honeypots …
falsify their reconnaissance conclusions, and deflect them away from their goals. Honeypots …
HSVI can solve zero-sum partially observable stochastic games
State-of-the-art methods for solving 2-player zero-sum imperfect information games rely on
linear programming or regret minimization, though not on dynamic programming (DP) or …
linear programming or regret minimization, though not on dynamic programming (DP) or …