[BOOK][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Randomized exploration in generalized linear bandits
We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL,
samples a generalized linear model (GLM) from the Laplace approximation to the posterior …
samples a generalized linear model (GLM) from the Laplace approximation to the posterior …
Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees
We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …
Bootstrap** upper confidence bound
Abstract Upper Confidence Bound (UCB) method is arguably the most celebrated one used
in online decision making with partial information feedback. Existing techniques for …
in online decision making with partial information feedback. Existing techniques for …
Thompson sampling and approximate inference
M Phan, Y Abbasi Yadkori… - Advances in Neural …, 2019 - proceedings.neurips.cc
We study the effects of approximate inference on the performance of Thompson sampling in
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …
An analysis of ensemble sampling
Ensemble sampling serves as a practical approximation to Thompson sampling when
maintaining an exact posterior distribution over model parameters is computationally …
maintaining an exact posterior distribution over model parameters is computationally …
Banditpam: Almost linear time k-medoids clustering via multi-armed bandits
Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …
clustering, k-medoids clustering requires the cluster centers to be actual data points and …
Bandit algorithms based on thompson sampling for bounded reward distributions
We focus on a classic reinforcement learning problem, called a multi-armed bandit, and
more specifically in the stochastic setting with reward distributions bounded in $[0, 1] $. For …
more specifically in the stochastic setting with reward distributions bounded in $[0, 1] $. For …
Feedback efficient online fine-tuning of diffusion models
Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …
proteins, and small molecules. However, in many cases, our goal is to model parts of the …