Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Deep counterfactual regret minimization
Abstract Counterfactual Regret Minimization (CFR) is the leading algorithm for solving large
imperfect-information games. It converges to an equilibrium by iteratively traversing the …
imperfect-information games. It converges to an equilibrium by iteratively traversing the …
Combining deep reinforcement learning and search for imperfect-information games
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …
a powerful paradigm that has led to a number of successes in single-agent settings and …
A unified approach to reinforcement learning, quantal response equilibria, and two-player zero-sum games
This work studies an algorithm, which we call magnetic mirror descent, that is inspired by
mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is …
mirror descent and the non-Euclidean proximal gradient algorithm. Our contribution is …
Robust multi-agent reinforcement learning with state uncertainty
In real-world multi-agent reinforcement learning (MARL) applications, agents may not have
perfect state information (eg, due to inaccurate measurement or malicious attacks), which …
perfect state information (eg, due to inaccurate measurement or malicious attacks), which …
Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent
Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …
[PDF][PDF] From external to swap regret 2.0: An efficient reduction for large action spaces
We provide a novel reduction from swap-regret minimization to external-regret minimization,
which improves upon the classical reductions of Blum-Mansour and Stoltz-Lugosi in that it …
which improves upon the classical reductions of Blum-Mansour and Stoltz-Lugosi in that it …
Last-iterate convergence in extensive-form games
Regret-based algorithms are highly efficient at finding approximate Nash equilibria in
sequential games such as poker games. However, most regret-based algorithms, including …
sequential games such as poker games. However, most regret-based algorithms, including …
Learning in two-player zero-sum partially observable Markov games with perfect recall
We study the problem of learning a Nash equilibrium (NE) in an extensive game with
imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum …
imperfect information (EGII) through self-play. Precisely, we focus on two-player, zero-sum …
Kernelized multiplicative weights for 0/1-polyhedral games: Bridging the gap between learning in extensive-form and normal-form games
While extensive-form games (EFGs) can be converted into normal-form games (NFGs),
doing so comes at the cost of an exponential blowup of the strategy space. So, progress on …
doing so comes at the cost of an exponential blowup of the strategy space. So, progress on …
Stochastic mirror descent: Convergence analysis and adaptive variants via the mirror stochastic polyak stepsize
We investigate the convergence of stochastic mirror descent (SMD) under interpolation in
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …
relatively smooth and smooth convex optimization. In relatively smooth convex optimization …