- Academic Search

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Save Cite Cited by 1291 Related articles All 34 versions Free GPT-4 Library Search View as HTML

[Free GPT-4]

[PDF] tor-lattimore.com

[BOOK][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Save Cite Cited by 3277 Related articles All 9 versions Free GPT-4 Library Search

[Free GPT-4]

[PDF] arxiv.org

Neural thompson sampling

W Zhang, D Zhou, L Li, Q Gu - arxiv preprint arxiv:2010.00827, 2020 - arxiv.org

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …

Save Cite Cited by 280 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

On information gain and regret bounds in gaussian process bandits

S Vakili, K Khezeli, V Picheny - International Conference on …, 2021 - proceedings.mlr.press

Consider the sequential optimization of an expensive to evaluate and possibly non-convex
objective function $ f $ from noisy feedback, that can be considered as a continuum-armed …

Save Cite Cited by 147 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Frequentist regret bounds for randomized least-squares value iteration

A Zanette, D Brandfonbrener… - International …, 2020 - proceedings.mlr.press

We consider the exploration-exploitation dilemma in finite-horizon reinforcement learning
(RL). When the state space is large or continuous, traditional tabular approaches are …

Save Cite Cited by 154 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Efficient exploration through bayesian deep q-networks

K Azizzadenesheli, E Brunskill… - 2018 Information …, 2018 - ieeexplore.ieee.org

We propose Bayesian Deep Q-Network (BDQN), a practical Thompson sampling based
Reinforcement Learning (RL) Algorithm. Thompson sampling allows for targeted exploration …

Save Cite Cited by 211 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Learning to optimize under non-stationarity

WC Cheung, D Simchi-Levi… - The 22nd International …, 2019 - proceedings.mlr.press

We introduce algorithms that achieve state-of-the-art dynamic regret bounds for non-
stationary linear stochastic bandit setting. It captures natural applications such as dynamic …

Save Cite Cited by 171 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Bayesian decision-making under misspecified priors with applications to meta-learning

M Simchowitz, C Tosh… - Advances in …, 2021 - proceedings.neurips.cc

Thompson sampling and other Bayesian sequential decision-making algorithms are among
the most popular approaches to tackle explore/exploit trade-offs in (contextual) bandits. The …

Save Cite Cited by 59 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Online (multinomial) logistic bandit: Improved regret and constant computation cost

YJ Zhang, M Sugiyama - Advances in Neural Information …, 2024 - proceedings.neurips.cc

This paper investigates the logistic bandit problem, a variant of the generalized linear bandit
model that utilizes a logistic model to depict the feedback from an action. While most existing …

Save Cite Cited by 12 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Meta dynamic pricing: Transfer learning across experiments

H Bastani, D Simchi-Levi, R Zhu - Management Science, 2022 - pubsonline.informs.org

We study the problem of learning shared structure across a sequence of dynamic pricing
experiments for related products. We consider a practical formulation in which the unknown …

Save Cite Cited by 140 Related articles All 9 versions Free GPT-4 Library Search

Create alert

Cite

Advanced search

Saved to My library

Linear thompson sampling revisited

A tutorial on thompson sampling

[BOOK][B] Bandit algorithms

Neural thompson sampling

On information gain and regret bounds in gaussian process bandits

Frequentist regret bounds for randomized least-squares value iteration

Efficient exploration through bayesian deep q-networks

Learning to optimize under non-stationarity

Bayesian decision-making under misspecified priors with applications to meta-learning

Online (multinomial) logistic bandit: Improved regret and constant computation cost

Meta dynamic pricing: Transfer learning across experiments