[BOOK][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Regret analysis of stochastic and nonstochastic multi-armed bandit problems
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …
with an exploration-exploitation trade-off. This is the balance between staying with the option …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Competitive caching with machine learned advice
Traditional online algorithms encapsulate decision making under uncertainty, and give ways
to hedge against all possible future events, while guaranteeing a nearly optimal solution, as …
to hedge against all possible future events, while guaranteeing a nearly optimal solution, as …
Improving online algorithms via ML predictions
In this work we study the problem of using machine-learned predictions to improve
performance of online algorithms. We consider two classical problems, ski rental and non …
performance of online algorithms. We consider two classical problems, ski rental and non …
A survey of learning in multiagent environments: Dealing with non-stationarity
The key challenge in multiagent learning is learning a best response to the behaviour of
other agents, which may be non-stationary: if the other agents adapt their strategy as well …
other agents, which may be non-stationary: if the other agents adapt their strategy as well …
Online scheduling via learned weights
Online algorithms are a hallmark of worst case optimization under uncertainty. On the other
hand, in practice, the input is often far from worst case, and has some predictable …
hand, in practice, the input is often far from worst case, and has some predictable …
Stochastic bandits robust to adversarial corruptions
We introduce a new model of stochastic bandits with adversarial corruptions which aims to
capture settings where most of the input follows a stochastic pattern but some fraction of it …
capture settings where most of the input follows a stochastic pattern but some fraction of it …
Better algorithms for stochastic bandits with adversarial corruptions
We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …
corruption. We present a new algorithm for this problem whose regret is nearly optimal …
Nearly optimal algorithms for linear contextual bandits with adversarial corruptions
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
where the reward at each round is corrupted by an adversary, and the corruption level (ie …