[PDF][PDF] Nash learning from human feedback
Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …
et al., 2022) have made remarkable strides in enhancing natural language understanding …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Training gans with optimism
We address the issue of limit cycling behavior in training Generative Adversarial Networks
and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs …
and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs …
Global convergence of multi-agent policy gradient in markov potential games
Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …
normal form games. They define the archetypal setting of multi-agent coordination as all …
Combining deep reinforcement learning and search for imperfect-information games
The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …
a powerful paradigm that has led to a number of successes in single-agent settings and …
Near-optimal no-regret learning in general games
C Daskalakis, M Fishelson… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract We show that Optimistic Hedge--a common variant of multiplicative-weights-
updates with recency bias--attains ${\rm poly}(\log T) $ regret in multi-player general-sum …
updates with recency bias--attains ${\rm poly}(\log T) $ regret in multi-player general-sum …
When can we learn general-sum Markov games with a large number of players sample-efficiently?
Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …
games with a large number of players. However, theoretically, the best known sample …
Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity
Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …
Online learning algorithms
N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org
Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …
models by processing data one at the time. Besides being computationally efficient, online …
Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O (1/k^ 2) Rate on Squared Gradient Norm
In this work, we study the computational complexity of reducing the squared gradient
magnitude for smooth minimax optimization problems. First, we present algorithms with …
magnitude for smooth minimax optimization problems. First, we present algorithms with …