[BOK][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Efficient and targeted COVID-19 border testing via reinforcement learning
Throughout the coronavirus disease 2019 (COVID-19) pandemic, countries have relied on a
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
variety of ad hoc border control protocols to allow for non-essential travel while safeguarding …
Exploration-exploitation in constrained mdps
In many sequential decision-making problems, the goal is to optimize a utility function while
satisfying a set of constraints on different utilities. This learning problem is formalized …
satisfying a set of constraints on different utilities. This learning problem is formalized …
Bandits with knapsacks
Multi-armed bandit problems are the predominant theoretical model of exploration-
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
exploitation tradeoffs in learning, and they have countless applications ranging from medical …
Feature-based dynamic pricing
We consider the problem faced by a firm that receives highly differentiated products in an
online fashion. The firm needs to price these products to sell them to its customer base …
online fashion. The firm needs to price these products to sell them to its customer base …
A unifying framework for online optimization with long-term constraints
We study online learning problems in which a decision maker has to take a sequence of
decisions subject to $ m $ long-term constraints. The goal of the decision maker is to …
decisions subject to $ m $ long-term constraints. The goal of the decision maker is to …
Meta dynamic pricing: Transfer learning across experiments
We study the problem of learning shared structure across a sequence of dynamic pricing
experiments for related products. We consider a practical formulation in which the unknown …
experiments for related products. We consider a practical formulation in which the unknown …
Adversarial bandits with knapsacks
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed
bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a …
bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a …
Fast algorithms for online stochastic convex programming
We introduce the online stochastic Convex Programming (CP) problem, a very general
version of stochastic online problems which allows arbitrary concave objectives and convex …
version of stochastic online problems which allows arbitrary concave objectives and convex …