Dynamic pricing and learning: historical origins, current research, and new directions
AV Den Boer - Surveys in operations research and management …, 2015 - Elsevier
The topic of dynamic pricing and learning has received a considerable amount of attention
in recent years, from different scientific communities. We survey these literature streams: we …
in recent years, from different scientific communities. We survey these literature streams: we …
[書籍][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Online decision making with high-dimensional covariates
Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …
of domains, such as personalized medicine and online advertising. Doing so involves …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Balanced linear contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …
Estimation considerations in contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …
From ads to interventions: Contextual bandits in mobile health
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …
Batched bandit problems
Batched bandit problems Page 1 The Annals of Statistics 2016, Vol. 44, No. 2, 660–681 DOI:
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
10.1214/15-AOS1381 © Institute of Mathematical Statistics, 2016 BATCHED BANDIT …
Batched multi-armed bandits problem
In this paper, we study the multi-armed bandit problem in the batched setting where the
employed policy must split data into a small number of batches. While the minimax regret for …
employed policy must split data into a small number of batches. While the minimax regret for …
Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …
performance on" easy" problems with a gap between the best and second-best arm. Are …