- Academic Search

R Munos, M Valko, D Calandriello, MG Azar… - arxiv preprint arxiv …, 2023 - ai-plans.com

Large language models (LLMs)(Anil et al., 2023; Glaese et al., 2022; OpenAI, 2023; Ouyang
et al., 2022) have made remarkable strides in enhancing natural language understanding …

Opslaan Citeren Geciteerd door 90 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Opslaan Citeren Geciteerd door 1253 Verwante artikelen Alle 7 versies In bibliotheek zoeken HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Training gans with optimism

C Daskalakis, A Ilyas, V Syrgkanis, H Zeng - arxiv preprint arxiv …, 2017 - arxiv.org

We address the issue of limit cycling behavior in training Generative Adversarial Networks
and propose the use of Optimistic Mirror Decent (OMD) for training Wasserstein GANs …

Opslaan Citeren Geciteerd door 602 Verwante artikelen Alle 5 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

Global convergence of multi-agent policy gradient in markov potential games

S Leonardos, W Overman, I Panageas… - arxiv preprint arxiv …, 2021 - arxiv.org

Potential games are arguably one of the most important and widely studied classes of
normal form games. They define the archetypal setting of multi-agent coordination as all …

Opslaan Citeren Geciteerd door 141 Verwante artikelen Alle 7 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Combining deep reinforcement learning and search for imperfect-information games

N Brown, A Bakhtin, A Lerer… - Advances in Neural …, 2020 - proceedings.neurips.cc

The combination of deep reinforcement learning and search at both training and test time is
a powerful paradigm that has led to a number of successes in single-agent settings and …

Opslaan Citeren Geciteerd door 170 Verwante artikelen Alle 10 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Near-optimal no-regret learning in general games

C Daskalakis, M Fishelson… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract We show that Optimistic Hedge--a common variant of multiplicative-weights-
updates with recency bias--attains ${\rm poly}(\log T) $ regret in multi-player general-sum …

Opslaan Citeren Geciteerd door 116 Verwante artikelen Alle 6 versies HTML-versie

[Free GPT-4]

[PDF] arxiv.org

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Z Song, S Mei, Y Bai - arxiv preprint arxiv:2110.04184, 2021 - arxiv.org

Multi-agent reinforcement learning has made substantial empirical progresses in solving
games with a large number of players. However, theoretically, the best known sample …

Opslaan Citeren Geciteerd door 115 Verwante artikelen Alle 3 versies HTML-versie

[Free GPT-4]

[PDF] neurips.cc

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

K Zhang, S Kakade, T Basar… - Advances in Neural …, 2020 - proceedings.neurips.cc

Abstract Model-based reinforcement learning (RL), which finds an optimal policy using an
empirical model, has long been recognized as one of the cornerstones of RL. It is especially …

Opslaan Citeren Geciteerd door 155 Verwante artikelen Alle 12 versies HTML-versie

[Free GPT-4]

[PDF] nsf.gov

Online learning algorithms

N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org

Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …

Opslaan Citeren Geciteerd door 46 Verwante artikelen Alle 6 versies

[Free GPT-4]

[PDF] mlr.press

Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O (1/k^ 2) Rate on Squared Gradient Norm

TH Yoon, EK Ryu - International Conference on Machine …, 2021 - proceedings.mlr.press

In this work, we study the computational complexity of reducing the squared gradient
magnitude for smooth minimax optimization problems. First, we present algorithms with …

Opslaan Citeren Geciteerd door 123 Verwante artikelen Alle 2 versies HTML-versie

Melding maken

Citeren

Geavanceerd zoeken

Opgeslagen in Mijn bibliotheek

Fast convergence of regularized learning in games

[PDF][PDF] Nash learning from human feedback

Introduction to multi-armed bandits

Training gans with optimism

Global convergence of multi-agent policy gradient in markov potential games

Combining deep reinforcement learning and search for imperfect-information games

Near-optimal no-regret learning in general games

When can we learn general-sum Markov games with a large number of players sample-efficiently?

Model-based multi-agent rl in zero-sum markov games with near-optimal sample complexity

Online learning algorithms

Accelerated Algorithms for Smooth Convex-Concave Minimax Problems with O (1/k^ 2) Rate on Squared Gradient Norm