- Academic Search

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Save Cite Cited by 3277 Related articles All 9 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] mlr.press

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International Conference on Machine …, 2020 - proceedings.mlr.press

A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Save Cite Cited by 236 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Randomized exploration in generalized linear bandits

B Kveton, M Zaheer, C Szepesvari… - International …, 2020 - proceedings.mlr.press

We study two randomized algorithms for generalized linear bandits. The first, GLM-TSL,
samples a generalized linear model (GLM) from the Laplace approximation to the posterior …

Save Cite Cited by 123 Related articles All 13 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

D Tiapkin, D Belomestny… - Advances in …, 2022 - proceedings.neurips.cc

We consider reinforcement learning in an environment modeled by an episodic, tabular,
step-dependent Markov decision process of horizon $ H $ with $ S $ states, and $ A …

Save Cite Cited by 9 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Bootstrap** upper confidence bound

B Hao, Y Abbasi Yadkori, Z Wen… - Advances in neural …, 2019 - proceedings.neurips.cc

Abstract Upper Confidence Bound (UCB) method is arguably the most celebrated one used
in online decision making with partial information feedback. Existing techniques for …

Save Cite Cited by 68 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Thompson sampling and approximate inference

M Phan, Y Abbasi Yadkori… - Advances in Neural …, 2019 - proceedings.neurips.cc

We study the effects of approximate inference on the performance of Thompson sampling in
the $ k $-armed bandit problems. Thompson sampling is a successful algorithm for online …

Save Cite Cited by 53 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

An analysis of ensemble sampling

C Qin, Z Wen, X Lu, B Van Roy - Advances in Neural …, 2022 - proceedings.neurips.cc

Ensemble sampling serves as a practical approximation to Thompson sampling when
maintaining an exact posterior distribution over model parameters is computationally …

Save Cite Cited by 27 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Banditpam: Almost linear time k-medoids clustering via multi-armed bandits

M Tiwari, MJ Zhang, J Mayclin… - Advances in …, 2020 - proceedings.neurips.cc

Clustering is a ubiquitous task in data science. Compared to the commonly used k-means
clustering, k-medoids clustering requires the cluster centers to be actual data points and …

Save Cite Cited by 49 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Bandit algorithms based on thompson sampling for bounded reward distributions

C Riou, J Honda - Algorithmic Learning Theory, 2020 - proceedings.mlr.press

We focus on a classic reinforcement learning problem, called a multi-armed bandit, and
more specifically in the stochastic setting with reward distributions bounded in $[0, 1] $. For …

Save Cite Cited by 47 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Feedback efficient online fine-tuning of diffusion models

M Uehara, Y Zhao, K Black, E Hajiramezanali… - arxiv preprint arxiv …, 2024 - arxiv.org

Diffusion models excel at modeling complex data distributions, including those of images,
proteins, and small molecules. However, in many cases, our goal is to model parts of the …

Save Cite Cited by 17 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Garbage in, reward out: Bootstrap** exploration in multi-armed bandits

[BOOK][B] Bandit algorithms

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

Randomized exploration in generalized linear bandits

Optimistic posterior sampling for reinforcement learning with few samples and tight guarantees

Bootstrap** upper confidence bound

Thompson sampling and approximate inference

An analysis of ensemble sampling

Banditpam: Almost linear time k-medoids clustering via multi-armed bandits

Bandit algorithms based on thompson sampling for bounded reward distributions

Feedback efficient online fine-tuning of diffusion models