[PDF][PDF] Nash learning from human feedback

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Training gans with optimism

C Daskalakis, A Ilyas, V Syrgkanis, H Zeng - arxiv preprint arxiv …, 2017 - arxiv.org
We address the issue of limit cycling behavior in training Generative Adversarial Networks
and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs …

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arxiv preprint arxiv …, 2021 - arxiv.org
Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

Combining deep reinforcement learning and search for imperfect-information games

N Brown, A Bakhtin, A Lerer… - Advances in Neural …, 2020 - proceedings.neurips.cc
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …

Near-optimal no-regret learning in general games

C Daskalakis, M Fishelson… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract We show that Optimistic Hedge--a common variant of multiplicative-weights-
updates with recency bias--attains ${\rm poly}(\log T) $ regret in multi-player general-sum …

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arxiv preprint arxiv:2110.04184, 2021 - arxiv.org
Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

Online learning algorithms

N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org
Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …

Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O (1/k^ 2) Rate on Squared Gradient Norm

TH Yoon, EK Ryu - International Conference on Machine …, 2021 - proceedings.mlr.press
In this work, we study the computational complexity of reducing the squared gradient
magnitude for smooth minimax optimization problems. First, we present algorithms with …