[BOOK][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Randomized exploration in generalized linear bandits

B Kveton, M Zaheer, C Szepesvari… - International …, 2020 - proceedings.mlr.press
We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL,
samples a generalized linear model (GLM) from the Laplace approximation to the posterior …

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc
We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

Bootstrap** upper confidence bound

B Hao, Y Abbasi Yadkori, Z Wen… - Advances in neural …, 2019 - proceedings.neurips.cc
Abstract Upper Confidence Bound (UCB) method is arguably the most celebrated one used
in online decision making with partial information feedback. Existing techniques for …

Thompson sampling and approximate inference

M Phan, Y Abbasi Yadkori… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the effects of approximate inference on the performance of Thompson sampling in
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …

An analysis of ensemble sampling

C Qin, Z Wen, X Lu, B Van Roy - Advances in Neural …, 2022 - proceedings.neurips.cc
Ensemble sampling serves as a practical approximation to Thompson sampling when
maintaining an exact posterior distribution over model parameters is computationally …

Banditpam: Almost linear time k-medoids clustering via multi-armed bandits

M Tiwari, MJ Zhang, J Mayclin… - Advances in …, 2020 - proceedings.neurips.cc
Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …

Bandit algorithms based on thompson sampling for bounded reward distributions

C Riou, J Honda - Algorithmic Learning Theory, 2020 - proceedings.mlr.press
We focus on a classic reinforcement learning problem, called a multi-armed bandit, and
more specifically in the stochastic setting with reward distributions bounded in $[0, 1] $. For …

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arxiv preprint arxiv …, 2024 - arxiv.org
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …