[BOOK][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

[PDF][PDF] On the complexity of best-arm identification in multi-armed bandit models

E Kaufmann, O Cappé, A Garivier - The Journal of Machine Learning …, 2016 - jmlr.org
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in
many different contexts in statistics and machine learning. Whereas the achievable limit in …

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

Batched bandit problems

V Perchet, P Rigollet, S Chassang, E Snowberg - 2016 - projecteuclid.org
Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …

Explore first, exploit next: The true shape of regret in bandit problems

A Garivier, P Ménard, G Stoltz - Mathematics of Operations …, 2019 - pubsonline.informs.org
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain
nonasymptotic, distribution-dependent bounds and provide simple proofs based only on …

Learning unknown service rates in queues: A multiarmed bandit approach

S Krishnasamy, R Sen, R Johari… - Operations …, 2021 - pubsonline.informs.org
Consider a queueing system consisting of multiple servers. Jobs arrive over time and enter a
queue for service; the goal is to minimize the size of this queue. At each opportunity for …

Beating stochastic and adversarial semi-bandits optimally and simultaneously

J Zimmert, H Luo, CY Wei - International Conference on …, 2019 - proceedings.mlr.press
We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal
{O}(\log T) $ regret for stochastic environments and $\mathcal {O}(\sqrt {T}) $ regret for …

On explore-then-commit strategies

A Garivier, T Lattimore… - Advances in Neural …, 2016 - proceedings.neurips.cc
We study the problem of minimising regret in two-armed bandit problems with Gaussian
rewards. Our objective is to use this simple setting to illustrate that strategies based on an …

Risk-averse multi-armed bandit problems under mean-variance measure

S Vakili, Q Zhao - IEEE Journal of Selected Topics in Signal …, 2016 - ieeexplore.ieee.org
The multi-armed bandit (MAB) problems have been studied mainly under the measure of
expected total reward accrued over a horizon of length T. In this paper, we address the issue …

Online learning in repeated auctions

J Weed, V Perchet, P Rigollet - Conference on Learning …, 2016 - proceedings.mlr.press
Motivated by online advertising auctions, we consider repeated Vickrey auctions where
goods of unknown value are sold sequentially and bidders only learn (potentially noisy) …