[BOOK][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
[PDF][PDF] On the complexity of best-arm identification in multi-armed bandit models
The stochastic multi-armed bandit model is a simple abstraction that has proven useful in
many different contexts in statistics and machine learning. Whereas the achievable limit in …
many different contexts in statistics and machine learning. Whereas the achievable limit in …
Batched multi-armed bandits problem
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …
employed policy must split data into a small number of batches. While the minimax regret for …
Batched bandit problems
Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
Explore first, exploit next: The true shape of regret in bandit problems
We revisit lower bounds on the regret in the case of multiarmed bandit problems. We obtain
nonasymptotic, distribution-dependent bounds and provide simple proofs based only on …
nonasymptotic, distribution-dependent bounds and provide simple proofs based only on …
Learning unknown service rates in queues: A multiarmed bandit approach
Consider a queueing system consisting of multiple servers. Jobs arrive over time and enter a
queue for service; the goal is to minimize the size of this queue. At each opportunity for …
queue for service; the goal is to minimize the size of this queue. At each opportunity for …
Beating stochastic and adversarial semi-bandits optimally and simultaneously
We develop the first general semi-bandit algorithm that simultaneously achieves $\mathcal
{O}(\log T) $ regret for stochastic environments and $\mathcal {O}(\sqrt {T}) $ regret for …
{O}(\log T) $ regret for stochastic environments and $\mathcal {O}(\sqrt {T}) $ regret for …
On explore-then-commit strategies
We study the problem of minimising regret in two-armed bandit problems with Gaussian
rewards. Our objective is to use this simple setting to illustrate that strategies based on an …
rewards. Our objective is to use this simple setting to illustrate that strategies based on an …
Risk-averse multi-armed bandit problems under mean-variance measure
The multi-armed bandit (MAB) problems have been studied mainly under the measure of
expected total reward accrued over a horizon of length T. In this paper, we address the issue …
expected total reward accrued over a horizon of length T. In this paper, we address the issue …
Online learning in repeated auctions
Motivated by online advertising auctions, we consider repeated Vickrey auctions where
goods of unknown value are sold sequentially and bidders only learn (potentially noisy) …
goods of unknown value are sold sequentially and bidders only learn (potentially noisy) …