[KNIHA][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

More adaptive algorithms for adversarial bandits

CY Wei, H Luo - Conference On Learning Theory, 2018 - proceedings.mlr.press
We develop a novel and generic algorithm for the adversarial multi-armed bandit problem
(or more generally the combinatorial semi-bandit problem). When instantiated differently, our …

What doubling tricks can and can't do for multi-armed bandits

L Besson, E Kaufmann - arxiv preprint arxiv:1803.06971, 2018 - arxiv.org
An online reinforcement learning algorithm is anytime if it does not need to know in advance
the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from …

SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits

E Boursier, V Perchet - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed
bandit problem, where several players pull arms simultaneously and collisions occur if one …

Thompson sampling with less exploration is fast and optimal

T **, X Yang, X **ao, P Xu - International Conference on …, 2023 - proceedings.mlr.press
Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …

Stochastic multi-armed bandits with strongly reward-dependent delays

Y Tang, Y Wang, Z Zheng - International Conference on …, 2024 - proceedings.mlr.press
There has been increasing interest in applying multi-armed bandits to adaptive designs in
clinical trials. However, most literature assumes that a previous patient's survival response of …

Statistical efficiency of thompson sampling for combinatorial semi-bandits

P Perrault, E Boursier, M Valko… - Advances in Neural …, 2020 - proceedings.neurips.cc
We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback
(CMAB). In CMAB, the question of the existence of an efficient policy with an optimal …

Learning in repeated auctions

T Nedelec, C Calauzènes, N El Karoui… - … and Trends® in …, 2022 - nowpublishers.com
Online auctions are one of the most fundamental facets of the modern economy and power
an industry generating hundreds of billions of dollars a year in revenue. Auction theory has …

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

T **, P Xu, X **ao… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits,
where the reward distribution is from a one-dimensional exponential family, which covers …

Mots: Minimax optimal thompson sampling

T **, P Xu, J Shi, X **ao, Q Gu - … Conference on Machine …, 2021 - proceedings.mlr.press
Thompson sampling is one of the most widely used algorithms in many online decision
problems due to its simplicity for implementation and superior empirical performance over …