Deep counterfactual regret minimization

N Brown, A Lerer, S Gross… - … conference on machine …, 2019‏ - proceedings.mlr.press
Abstract Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large
imperfect-information games. It converges to an equilibrium by iteratively traversing the …

Combining deep reinforcement learning and search for imperfect-information games

N Brown, A Bakhtin, A Lerer… - Advances in neural …, 2020‏ - proceedings.neurips.cc
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …

A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games

S Sokota, R D'Orazio, JZ Kolter, N Loizou… - arxiv preprint arxiv …, 2022‏ - arxiv.org
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by
mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is …

Robust multi-agent reinforcement learning with state uncertainty

S He, S Han, S Su, S Han, S Zou, F Miao - arxiv preprint arxiv:2307.16212, 2023‏ - arxiv.org
In real-world multi-agent reinforcement learning (MARL) applications, agents may not have
perfect state information (eg, due to inaccurate measurement or malicious attacks), which …

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

G Farina, C Kroer, T Sandholm - … of the AAAI Conference on Artificial …, 2021‏ - ojs.aaai.org
Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …

[PDF][PDF] From external to swap regret 2.0: An efficient reduction for large action spaces

Y Dagan, C Daskalakis, M Fishelson… - Proceedings of the 56th …, 2024‏ - dl.acm.org
We provide a novel reduction from swap-regret minimization to external-regret minimization,
which improves upon the classical reductions of Blum-Mansour and Stoltz-Lugosi in that it …

Last-iterate convergence in extensive-form games

CW Lee, C Kroer, H Luo - Advances in Neural Information …, 2021‏ - proceedings.neurips.cc
Regret-based algorithms are highly efficient at finding approximate Nash equilibria in
sequential games such as poker games. However, most regret-based algorithms, including …

Learning in two-player zero-sum partially observable Markov games with perfect recall

T Kozuno, P Ménard, R Munos… - Advances in Neural …, 2021‏ - proceedings.neurips.cc
We study the problem of learning a Nash equilibrium (NE) in an extensive game with
imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum …

Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games

G Farina, CW Lee, H Luo… - … Conference on Machine …, 2022‏ - proceedings.mlr.press
While extensive-form games (EFGs) can be converted into normal-form games (NFGs),
doing so comes at the cost of an exponential blowup of the strategy space. So, progress on …

Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize

R D'Orazio, N Loizou, I Laradji, I Mitliagkas - arxiv preprint arxiv …, 2021‏ - arxiv.org
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …