[图书][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Learning to optimize via posterior sampling
This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …
Near-optimal regret bounds for thompson sampling
Thompson Sampling (TS) is one of the oldest heuristics for multiarmed bandit problems. It is
a randomized algorithm based on Bayesian ideas and has recently generated significant …
a randomized algorithm based on Bayesian ideas and has recently generated significant …
Simple bayesian algorithms for best arm identification
D Russo - Conference on Learning Theory, 2016 - proceedings.mlr.press
This paper considers the optimal adaptive allocation of measurement effort for identifying the
best among a finite set of options or designs. An experimenter sequentially chooses designs …
best among a finite set of options or designs. An experimenter sequentially chooses designs …
An information-theoretic analysis of thompson sampling
We provide an information-theoretic analysis of Thompson sampling that applies across a
broad range of online optimization problems in which a decision-maker must learn from …
broad range of online optimization problems in which a decision-maker must learn from …
Linear thompson sampling revisited
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …
From bandits to monte-carlo tree search: The optimistic principle applied to optimization and planning
R Munos - Foundations and Trends® in Machine Learning, 2014 - nowpublishers.com
This work covers several aspects of the optimism in the face of uncertainty principle applied
to large scale optimization problems under finite numerical budget. The initial motivation for …
to large scale optimization problems under finite numerical budget. The initial motivation for …
Thompson sampling for complex online problems
We consider stochastic multi-armed bandit problems with complex actions over a set of
basic arms, where the decision maker plays a complex action rather than a basic arm in …
basic arms, where the decision maker plays a complex action rather than a basic arm in …
Optimal regret analysis of thompson sampling in stochastic multi-armed bandit problem with multiple plays
We discuss a multiple-play multi-armed bandit (MAB) problem in which several arms are
selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a …
selected at each round. Recently, Thompson sampling (TS), a randomized algorithm with a …
Data poisoning attacks on stochastic bandits
Stochastic multi-armed bandits form a class of online learning problems that have important
applications in online recommendation systems, adaptive medical treatment, and many …
applications in online recommendation systems, adaptive medical treatment, and many …