[LIBRO][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Stochastic gradient methods for distributionally robust optimization with f-divergences
We develop efficient solution methods for a robust empirical risk minimization problem
designed to give calibrated confidence intervals on performance and provide optimal …
designed to give calibrated confidence intervals on performance and provide optimal …
Thompson sampling: An asymptotically optimal finite-time analysis
The question of the optimality of Thompson Sampling for solving the stochastic multi-armed
bandit problem had been open since 1933. In this paper we answer it positively for the case …
bandit problem had been open since 1933. In this paper we answer it positively for the case …
The KL-UCB algorithm for bounded stochastic bandits and beyond
This paper presents a finite-time analysis of the KL-UCB algorithm, an online, horizon-free
index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary …
index policy for stochastic bandit problems. We prove two distinct results: first, for arbitrary …
Kullback-Leibler upper confidence bounds for optimal sequential allocation
We consider optimal sequential allocation in the context of the so-called stochastic multi-
armed bandit model. We describe a generic index policy, in the sense of Gittins [JR Stat …
armed bandit model. We describe a generic index policy, in the sense of Gittins [JR Stat …
Batched bandit problems
Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
Batched multi-armed bandits problem
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …
employed policy must split data into a small number of batches. While the minimax regret for …
Corralling a band of bandit algorithms
We study the problem of combining multiple bandit algorithms (that is, online learning
algorithms with partial feedback) with the goal of creating a master algorithm that performs …
algorithms with partial feedback) with the goal of creating a master algorithm that performs …