Hierarchical bayesian bandits

J Hong, B Kveton, M Zaheer… - International …, 2022‏ - proceedings.mlr.press
Abstract Meta-, multi-task, and federated learning can be all viewed as solving similar tasks,
drawn from a distribution that reflects task similarities. We provide a unified view of all these …

[PDF][PDF] Adaptivity and confounding in multi-armed bandit experiments

C Qin, D Russo - arxiv preprint arxiv:2202.09036, 2022‏ - aeaweb.org
We explore a new model of bandit experiments where a potentially nonstationary sequence
of contexts influences arms' performance. Context-unaware algorithms risk confounding …

Mixed-effect thompson sampling

I Aouali, B Kveton, S Katariya - International Conference on …, 2023‏ - proceedings.mlr.press
A contextual bandit is a popular framework for online learning to act under uncertainty. In
practice, the number of actions is huge and their expected rewards are correlated. In this …

Multi-task off-policy learning from bandit feedback

J Hong, B Kveton, M Zaheer… - International …, 2023‏ - proceedings.mlr.press
Many practical problems involve solving similar tasks. In recommender systems, the tasks
can be users with similar preferences; in search engines, the tasks can be items with similar …

Transportability for bandits with data from different environments

A Bellot, A Malek, S Chiappa - Advances in Neural …, 2023‏ - proceedings.neurips.cc
A unifying theme in the design of intelligent agents is to efficiently optimize a policy based on
what prior knowledge of the problem is available and what actions can be taken to learn …

Lifelong bandit optimization: no prior and no regret

F Schur, P Kassraie, J Rothfuss… - Uncertainty in Artificial …, 2023‏ - proceedings.mlr.press
Abstract Machine learning algorithms are often repeatedly. applied to problems with similar
structure over and over again. We focus on solving a sequence of bandit optimization tasks …

Meta Learning in Bandits within shared affine Subspaces

S Bilaj, S Dhouib, S Maghsudi - International Conference on …, 2024‏ - proceedings.mlr.press
We study the problem of meta-learning several contextual stochastic bandits tasks by
leveraging their concentration around a low dimensional affine subspace, which we learn …

Thompson sampling with diffusion generative prior

YG Hsieh, SP Kasiviswanathan, B Kveton… - arxiv preprint arxiv …, 2023‏ - arxiv.org
In this work, we initiate the idea of using denoising diffusion models to learn priors for online
decision making problems. Our special focus is on the meta-learning for bandit framework …

Thompson sampling for robust transfer in multi-task bandits

Z Wang, C Zhang, K Chaudhuri - arxiv preprint arxiv:2206.08556, 2022‏ - arxiv.org
We study the problem of online multi-task learning where the tasks are performed within
similar but not necessarily identical multi-armed bandit environments. In particular, we study …

Prior-dependent allocations for bayesian fixed-budget best-arm identification in structured bandits

N Nguyen, I Aouali, A György, C Vernade - arxiv preprint arxiv …, 2024‏ - arxiv.org
We study the problem of Bayesian fixed-budget best-arm identification (BAI) in structured
bandits. We propose an algorithm that uses fixed allocations based on the prior information …