Dynamic pricing and learning: historical origins, current research, and new directions
AV Den Boer - Surveys in operations research and management …, 2015 - Elsevier
The topic of dynamic pricing and learning has received a considerable amount of attention
in recent years, from different scientific communities. We survey these literature streams: we …
in recent years, from different scientific communities. We survey these literature streams: we …
Neural thompson sampling
Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …
[КНИГА][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Neural contextual bandits with ucb-based exploration
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …
unknown function with additive noise. No assumption is made about the reward function …
Weight uncertainty in neural network
We introduce a new, efficient, principled and backpropagation-compatible algorithm for
learning a probability distribution on the weights of a neural network, called Bayes by …
learning a probability distribution on the weights of a neural network, called Bayes by …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Provably optimal algorithms for generalized linear contextual bandits
Contextual bandits are widely used in Internet services from news recommendation to
advertising, and to Web search. Generalized linear models (logistical regression in …
advertising, and to Web search. Generalized linear models (logistical regression in …
Thompson sampling for contextual bandits with linear payoffs
Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a
randomized algorithm based on Bayesian ideas, and has recently generated significant …
randomized algorithm based on Bayesian ideas, and has recently generated significant …
Learning to optimize via posterior sampling
This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …