Dynamic pricing and learning: historical origins, current research, and new directions
AV Den Boer - Surveys in operations research and management …, 2015 - Elsevier
The topic of dynamic pricing and learning has received a considerable amount of attention
in recent years, from different scientific communities. We survey these literature streams: we …
in recent years, from different scientific communities. We survey these literature streams: we …
[КНИГА][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Online decision making with high-dimensional covariates
Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …
of domains, such as personalized medicine and online advertising. Doing so involves …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
[PDF][PDF] Counterfactual reasoning and learning systems: The example of computational advertising.
This work shows how to leverage causal inference to understand the behavior of complex
learning systems interacting with their environment and predict the consequences of …
learning systems interacting with their environment and predict the consequences of …
Stochastic multi-armed-bandit problem with non-stationary rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
Balanced linear contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …
Estimation considerations in contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …