[KIRJA][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International conference on machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Feature-based dynamic pricing

MC Cohen, I Lobel, R Paes Leme - Management Science, 2020 - pubsonline.informs.org
We consider the problem faced by a firm that receives highly differentiated products in an
online fashion. The firm needs to price these products to sell them to its customer base …

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

A new algorithm for non-stationary contextual bandits: Efficient, optimal and parameter-free

Y Chen, CW Lee, H Luo… - Conference on Learning …, 2019 - proceedings.mlr.press
We propose the first contextual bandit algorithm that is parameter-free, efficient, and optimal
in terms of dynamic regret. Specifically, our algorithm achieves $\mathcal {O}(\min\{\sqrt …

Adversarial bandits with knapsacks

N Immorlica, K Sankararaman, R Schapire… - Journal of the ACM, 2022 - dl.acm.org
We consider Bandits with Knapsacks (henceforth, BwK), a general model for multi-armed
bandits under supply/budget constraints. In particular, a bandit algorithm needs to solve a …

New oracle-efficient algorithms for private synthetic data release

G Vietri, G Tian, M Bun, T Steinke… - … Conference on Machine …, 2020 - proceedings.mlr.press
We present three new algorithms for constructing differentially private synthetic data—a
sanitized version of a sensitive dataset that approximately preserves the answers to a large …

Efficient contextual bandits in non-stationary worlds

H Luo, CY Wei, A Agarwal… - Conference On Learning …, 2018 - proceedings.mlr.press
Most contextual bandit algorithms minimize regret against the best fixed policy, a
questionable benchmark for non-stationary environments that are ubiquitous in applications …