[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

Mastering the game of Stratego with model-free multiagent reinforcement learning

J Perolat, B De Vylder, D Hennes, E Tarassov, F Strub… - Science, 2022 - science.org
We introduce DeepNash, an autonomous agent that plays the imperfect information game
Stratego at a human expert level. Stratego is one of the few iconic board games that artificial …

Language agents with reinforcement learning for strategic play in the werewolf game

Z Xu, C Yu, F Fang, Y Wang, Y Wu - arxiv preprint arxiv:2310.18940, 2023 - arxiv.org
Agents built with large language models (LLMs) have shown great potential across a wide
range of domains. However, in complex decision-making tasks, pure LLM-based agents …

Learning in games: a systematic review

RJ Qin, Y Yu - Science China Information Sciences, 2024 - Springer
Game theory studies the mathematical models for self-interested individuals. Nash
equilibrium is arguably the most central solution in game theory. While finding the Nash …

Sample and communication-efficient decentralized actor-critic algorithms with finite-time analysis

Z Chen, Y Zhou, RR Chen… - … Conference on Machine …, 2022 - proceedings.mlr.press
Actor-critic (AC) algorithms have been widely used in decentralized multi-agent systems to
learn the optimal joint control policy. However, existing decentralized AC algorithms either …

Navigating the landscape of multiplayer games

S Omidshafiei, K Tuyls, WM Czarnecki… - Nature …, 2020 - nature.com
Multiplayer games have long been used as testbeds in artificial intelligence research, aptly
referred to as the Drosophila of artificial intelligence. Traditionally, researchers have focused …

Escher: Eschewing importance sampling in games by computing a history value function to estimate regret

S McAleer, G Farina, M Lanctot, T Sandholm - arxiv preprint arxiv …, 2022 - arxiv.org
Recent techniques for approximating Nash equilibria in very large games leverage neural
networks to learn approximately optimal policies (strategies). One promising line of research …

Reactive exploration to cope with non-stationarity in lifelong reinforcement learning

CA Steinparz, T Schmied, F Paischer… - Conference on …, 2022 - proceedings.mlr.press
In lifelong learning an agent learns throughout its entire life without resets, in a constantly
changing environment, as we humans do. Consequently, lifelong learning comes with a …

Fictitious cross-play: Learning global nash equilibrium in mixed cooperative-competitive games

Z Xu, Y Liang, C Yu, Y Wang, Y Wu - arxiv preprint arxiv:2310.03354, 2023 - arxiv.org
Self-play (SP) is a popular multi-agent reinforcement learning (MARL) framework for solving
competitive games, where each agent optimizes policy by treating others as part of the …

A Survey on Self-play Methods in Reinforcement Learning

R Zhang, Z Xu, C Ma, C Yu, WW Tu, S Huang… - arxiv preprint arxiv …, 2024 - arxiv.org
Self-play, characterized by agents' interactions with copies or past versions of itself, has
recently gained prominence in reinforcement learning. This paper first clarifies the …