[LIBRO][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Stochastic gradient methods for distributionally robust optimization with f-divergences

H Namkoong, JC Duchi - Advances in neural information …, 2016 - proceedings.neurips.cc
We develop efficient solution methods for a robust empirical risk minimization problem
designed to give calibrated confidence intervals on performance and provide optimal …

Thompson sampling: An asymptotically optimal finite-time analysis

E Kaufmann, N Korda, R Munos - International conference on algorithmic …, 2012 - Springer
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed
bandit problem had been open since 1933. In this paper we answer it positively for the case …

The KL-UCB algorithm for bounded stochastic bandits and beyond

A Garivier, O Cappé - … of the 24th annual conference on …, 2011 - proceedings.mlr.press
This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free
index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary …

Kullback-Leibler upper confidence bounds for optimal sequential allocation

O Cappé, A Garivier, OA Maillard, R Munos… - The Annals of …, 2013 - JSTOR
We consider optimal sequential allocation in the context of the so-called stochastic multi-
armed bandit model. We describe a generic index policy, in the sense of Gittins [JR Stat …

Batched bandit problems

V Perchet, P Rigollet, S Chassang, E Snowberg - 2016 - projecteuclid.org
Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

Corralling a band of bandit algorithms

A Agarwal, H Luo, B Neyshabur… - … on Learning Theory, 2017 - proceedings.mlr.press
We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …