Federated linear contextual bandits
This paper presents a novel federated linear contextual bandits model, where individual
clients face different $ K $-armed stochastic bandits coupled through common global …
clients face different $ K $-armed stochastic bandits coupled through common global …
Efficient and targeted COVID-19 border testing via reinforcement learning
Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
Customer acquisition via display advertising using multi-armed bandit experiments
Firms using online advertising regularly run experiments with multiple versions of their ads
since they are uncertain about which ones are most effective. During a campaign, firms try to …
since they are uncertain about which ones are most effective. During a campaign, firms try to …
Batched multi-armed bandits problem
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …
employed policy must split data into a small number of batches. While the minimax regret for …
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
Inference for batched bandits
As bandit algorithms are increasingly utilized in scientific studies and industrial applications,
there is an associated increasing need for reliable inference methods based on the resulting …
there is an associated increasing need for reliable inference methods based on the resulting …
Provably efficient q-learning with low switching cost
We take initial steps in studying PAC-MDP algorithms with limited adaptivity, that is,
algorithms that change its exploration policy as infrequently as possible during regret …
algorithms that change its exploration policy as infrequently as possible during regret …
Decentralized cooperative stochastic bandits
We study a decentralized cooperative stochastic multi-armed bandit problem with K arms on
a network of N agents. In our model, the reward distribution of each arm is the same for each …
a network of N agents. In our model, the reward distribution of each arm is the same for each …
Bandits with delayed, aggregated anonymous feedback
We study a variant of the stochastic $ K $-armed bandit problem, which we call" bandits with
delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a …
delayed, aggregated anonymous feedback”. In this problem, when the player pulls an arm, a …
SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits
Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed
bandit problem, where several players pull arms simultaneously and collisions occur if one …
bandit problem, where several players pull arms simultaneously and collisions occur if one …