Federated linear contextual bandits

R Huang, W Wu, J Yang… - Advances in neural …, 2021 - proceedings.neurips.cc
This paper presents a novel federated linear contextual bandits model, where individual
clients face different $ K $-armed stochastic bandits coupled through common global …

Efficient and targeted COVID-19 border testing via reinforcement learning

H Bastani, K Drakopoulos, V Gupta, I Vlachogiannis… - Nature, 2021 - nature.com
Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …

Customer acquisition via display advertising using multi-armed bandit experiments

EM Schwartz, ET Bradlow, PS Fader - Marketing Science, 2017 - pubsonline.informs.org
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …

Batched multi-armed bandits problem

Z Gao, Y Han, Z Ren, Z Zhou - Advances in Neural …, 2019 - proceedings.neurips.cc
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

Inference for batched bandits

K Zhang, L Janson, S Murphy - Advances in neural …, 2020 - proceedings.neurips.cc
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …

Provably efficient q-learning with low switching cost

Y Bai, T **e, N Jiang, YX Wang - Advances in Neural …, 2019 - proceedings.neurips.cc
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …

Decentralized cooperative stochastic bandits

D Martínez-Rubio, V Kanade… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on
a network of N agents. In our model, the reward distribution of each arm is the same for each …

Bandits with delayed, aggregated anonymous feedback

C Pike-Burke, S Agrawal… - International …, 2018 - proceedings.mlr.press
We study a variant of the stochastic $ K $-armed bandit problem, which we call" bandits with
delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a …

SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits

E Boursier, V Perchet - Advances in Neural Information …, 2019 - proceedings.neurips.cc
Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed
bandit problem, where several players pull arms simultaneously and collisions occur if one …