Google Acadèmic

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Desa Cita Citat per 1719 Articles relacionats Totes les 8 versions Free GPT-4 DeepSeek

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Desa Cita Citat per 352 Articles relacionats Totes les 2 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Near-optimal learning of extensive-form games with imperfect information

Y Bai, C **, S Mei, T Yu - International Conference on …, 2022 - proceedings.mlr.press

This paper resolves the open question of designing near-optimal algorithms for learning
imperfect-information extensive-form games from bandit feedback. We present the first line …

Desa Cita Citat per 33 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Efficient deviation types and learning for hindsight rationality in extensive-form games

D Morrill, R D'Orazio, M Lanctot… - International …, 2021 - proceedings.mlr.press

Hindsight rationality is an approach to playing general-sum games that prescribes no-regret
learning dynamics for individual agents with respect to a set of deviations, and further …

Desa Cita Citat per 38 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Double neural counterfactual regret minimization

H Li, K Hu, Z Ge, T Jiang, Y Qi, L Song - arxiv preprint arxiv:1812.10607, 2018 - arxiv.org

Counterfactual Regret Minimization (CRF) is a fundamental and effective technique for
solving Imperfect Information Games (IIG). However, the original CRF algorithm only works …

Desa Cita Citat per 69 Articles relacionats Totes les 7 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Escher: Eschewing importance sampling in games by computing a history value function to estimate regret

S McAleer, G Farina, M Lanctot, T Sandholm - arxiv preprint arxiv …, 2022 - arxiv.org

Recent techniques for approximating Nash equilibria in very large games leverage neural
networks to learn approximately optimal policies (strategies). One promising line of research …

Desa Cita Citat per 23 Articles relacionats Totes les 5 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Single deep counterfactual regret minimization

E Steinberger - arxiv preprint arxiv:1901.07621, 2019 - arxiv.org

Counterfactual Regret Minimization (CFR) is the most successful algorithm for finding
approximate Nash equilibria in imperfect information games. However, CFR's reliance on …

Desa Cita Citat per 52 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] ualberta.ca

Time and space: Why imperfect information games are hard

N Burch - 2018 - era.library.ualberta.ca

Decision-making problems with two agents can be modeled as two player games, and a
Nash equilibrium is the basic solution concept describing good play in adversarial games …

Desa Cita Citat per 55 Articles relacionats Totes les 6 versions Free GPT-4 DeepSeek Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Steering language models with game-theoretic solvers

I Gemp, R Patel, Y Bachrach, M Lanctot… - … Markets Workshop at …, 2024 - openreview.net

Mathematical models of strategic interactions among rational agents have long been studied
in game theory. However the interactions studied are often over a small set of discrete …

Desa Cita Citat per 4 Articles relacionats Versió HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The advantage regret-matching actor-critic

A Gruslys, M Lanctot, R Munos, F Timbers… - arxiv preprint arxiv …, 2020 - arxiv.org

Regret minimization has played a key role in online learning, equilibrium computation in
games, and reinforcement learning (RL). In this paper, we describe a general model-free RL …

Desa Cita Citat per 26 Articles relacionats Totes les 4 versions Free GPT-4 DeepSeek Versió HTML

Crea una alerta

Cita

Cerca avançada

S'ha desat a La meva biblioteca

Efficient Monte Carlo counterfactual regret minimization in games with many player actions

Multi-agent reinforcement learning: A selective overview of theories and algorithms

An overview of multi-agent reinforcement learning from game theoretical perspective

Near-optimal learning of extensive-form games with imperfect information

Efficient deviation types and learning for hindsight rationality in extensive-form games

Double neural counterfactual regret minimization

Escher: Eschewing importance sampling in games by computing a history value function to estimate regret

Single deep counterfactual regret minimization

Time and space: Why imperfect information games are hard

Steering language models with game-theoretic solvers

The advantage regret-matching actor-critic