- Academic Search

K Zhang, Z Yang, T Başar - Handbook of reinforcement learning and …, 2021 - Springer

Recent years have witnessed significant advances in reinforcement learning (RL), which
has registered tremendous success in solving various sequential decision-making problems …

Opslaan Citeren Geciteerd door 1719 Verwante artikelen Alle 7 versies

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

An overview of multi-agent reinforcement learning from game theoretical perspective

Y Yang, J Wang - arxiv preprint arxiv:2011.00583, 2020 - arxiv.org

Following the remarkable success of the AlphaGO series, 2019 was a booming year that
witnessed significant advances in multi-agent reinforcement learning (MARL) techniques …

Opslaan Citeren Geciteerd door 352 Verwante artikelen Alle 2 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] science.org

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Meta Fundamental AI Research Diplomacy Team … - Science, 2022 - science.org

Despite much progress in training artificial intelligence (AI) systems to imitate human
language, building agents that use language to communicate intentionally with humans in …

Opslaan Citeren Geciteerd door 263 Verwante artikelen Alle 4 versies

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Solving imperfect-information games via discounted regret minimization

N Brown, T Sandholm - Proceedings of the AAAI Conference on Artificial …, 2019 - aaai.org

Counterfactual regret minimization (CFR) is a family of iterative algorithms that are the most
popular and, in practice, fastest approach to approximately solving large …

Opslaan Citeren Geciteerd door 188 Verwante artikelen Alle 11 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Safe and nested subgame solving for imperfect-information games

N Brown, T Sandholm - Advances in neural information …, 2017 - proceedings.neurips.cc

In imperfect-information games, the optimal strategy in a subgame may depend on the
strategy in other, unreached subgames. Thus a subgame cannot be solved in isolation and …

Opslaan Citeren Geciteerd door 209 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Actor-critic policy optimization in partially observable multiagent environments

S Srinivasan, M Lanctot, V Zambaldi… - Advances in neural …, 2018 - proceedings.neurips.cc

Optimization of parameterized policies for reinforcement learning (RL) is an important and
challenging problem in artificial intelligence. Among the most common approaches are …

Opslaan Citeren Geciteerd door 173 Verwante artikelen Alle 9 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning

A Bakhtin, DJ Wu, A Lerer, J Gray, AP Jacob… - arxiv preprint arxiv …, 2022 - arxiv.org

No-press Diplomacy is a complex strategy game involving both cooperation and competition
that has served as a benchmark for multi-agent AI research. While self-play reinforcement …

Opslaan Citeren Geciteerd door 49 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

XDO: A double oracle algorithm for extensive-form games

S McAleer, JB Lanier, KA Wang… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Policy Space Response Oracles (PSRO) is a reinforcement learning (RL) algorithm
for two-player zero-sum games that has been empirically shown to find approximate Nash …

Opslaan Citeren Geciteerd door 65 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] aaai.org

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

G Farina, C Kroer, T Sandholm - … of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org

Blackwell approachability is a framework for reasoning about repeated games with vector-
valued payoffs. We introduce predictive Blackwell approachability, where an estimate of the …

Opslaan Citeren Geciteerd door 79 Verwante artikelen Alle 8 versies HTML-versie

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Computing approximate equilibria in sequential adversarial games by exploitability descent

E Lockhart, M Lanctot, J Pérolat, JB Lespiau… - arxiv preprint arxiv …, 2019 - arxiv.org

In this paper, we present exploitability descent, a new algorithm to compute approximate
equilibria in two-player zero-sum extensive-form games with imperfect information, by direct …

Opslaan Citeren Geciteerd door 88 Verwante artikelen Alle 6 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Dynamic thresholding and pruning for regret minimization

Multi-agent reinforcement learning: A selective overview of theories and algorithms

An overview of multi-agent reinforcement learning from game theoretical perspective

Human-level play in the game of Diplomacy by combining language models with strategic reasoning

Solving imperfect-information games via discounted regret minimization

Safe and nested subgame solving for imperfect-information games

Actor-critic policy optimization in partially observable multiagent environments

Mastering the game of no-press diplomacy via human-regularized reinforcement learning and planning

XDO: A double oracle algorithm for extensive-form games

Faster game solving via predictive blackwell approachability: Connecting regret matching and mirror descent

Computing approximate equilibria in sequential adversarial games by exploitability descent