- Academic Search

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Salva Cita Citato da 3283 Articoli correlati Tutte e 9 le versioni Ricerca biblioteche

[Free GPT-4]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Salva Cita Citato da 1253 Articoli correlati Tutte e 7 le versioni Ricerca biblioteche Versione HTML

[Free GPT-4]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Salva Cita Citato da 3282 Articoli correlati Tutte e 26 le versioni Ricerca biblioteche Versione HTML

[Free GPT-4]

[PDF] neurips.cc

Stochastic gradient methods for distributionally robust optimization with f-divergences

H Namkoong, JC Duchi - Advances in neural information …, 2016 - proceedings.neurips.cc

We develop efficient solution methods for a robust empirical risk minimization problem
designed to give calibrated confidence intervals on performance and provide optimal …

Salva Cita Citato da 386 Articoli correlati Tutte e 6 le versioni Versione HTML

[Free GPT-4]

[PDF] arxiv.org

Thompson sampling: An asymptotically optimal finite-time analysis

E Kaufmann, N Korda, R Munos - International conference on algorithmic …, 2012 - Springer

The question of the optimality of Thompson Sampling for solving the stochastic multi-armed
bandit problem had been open since 1933. In this paper we answer it positively for the case …

Salva Cita Citato da 805 Articoli correlati Tutte e 21 le versioni

[Free GPT-4]

[PDF] mlr.press

The KL-UCB algorithm for bounded stochastic bandits and beyond

A Garivier, O Cappé - … of the 24th annual conference on …, 2011 - proceedings.mlr.press

This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free
index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary …

Salva Cita Citato da 784 Articoli correlati Tutte e 11 le versioni Versione HTML

[Free GPT-4]

[PDF] projecteuclid.org

Kullback-Leibler upper confidence bounds for optimal sequential allocation

O Cappé, A Garivier, OA Maillard, R Munos… - The Annals of …, 2013 - JSTOR

We consider optimal sequential allocation in the context of the so-called stochastic multi-
armed bandit model. We describe a generic index policy, in the sense of Gittins [JR Stat …

Salva Cita Citato da 448 Articoli correlati Tutte e 22 le versioni

[Free GPT-4]

[PDF] projecteuclid.org

Batched bandit problems

V Perchet, P Rigollet, S Chassang, E Snowberg - 2016 - projecteuclid.org

Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …

Salva Cita Citato da 278 Articoli correlati Tutte e 26 le versioni

[Free GPT-4]

[PDF] neurips.cc

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc

In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

Salva Cita Citato da 172 Articoli correlati Tutte e 15 le versioni Versione HTML

[Free GPT-4]

[PDF] mlr.press

Corralling a band of bandit algorithms

A Agarwal, H Luo, B Neyshabur… - … on Learning Theory, 2017 - proceedings.mlr.press

We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …

Salva Cita Citato da 197 Articoli correlati Tutte e 6 le versioni Versione HTML

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

Regret bounds and minimax policies under partial monitoring

[LIBRO][B] Bandit algorithms

Introduction to multi-armed bandits

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Stochastic gradient methods for distributionally robust optimization with f-divergences

Thompson sampling: An asymptotically optimal finite-time analysis

The KL-UCB algorithm for bounded stochastic bandits and beyond

Kullback-Leibler upper confidence bounds for optimal sequential allocation

Batched bandit problems

Batched multi-armed bandits problem

Corralling a band of bandit algorithms