Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Competitive caching with machine learned advice

T Lykouris, S Vassilvitskii - Journal of the ACM (JACM), 2021 - dl.acm.org
Traditional online algorithms encapsulate decision making under uncertainty, and give ways
to hedge against all possible future events, while guaranteeing a nearly optimal solution, as …

Dual mirror descent for online allocation problems

S Balseiro, H Lu, V Mirrokni - International Conference on …, 2020 - proceedings.mlr.press
We consider online allocation problems with concave revenue functions and resource
constraints, which are central problems in revenue management and online advertising. In …

Corruption-robust offline reinforcement learning with general function approximation

C Ye, R Yang, Q Gu, T Zhang - Advances in Neural …, 2023 - proceedings.neurips.cc
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

Better algorithms for stochastic bandits with adversarial corruptions

A Gupta, T Koren, K Talwar - Conference on Learning …, 2019 - proceedings.mlr.press
We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …

Nearly optimal algorithms for linear contextual bandits with adversarial corruptions

J He, D Zhou, T Zhang, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …

Feature-based dynamic pricing

MC Cohen, I Lobel, R Paes Leme - Management Science, 2020 - pubsonline.informs.org
We consider the problem faced by a firm that receives highly differentiated products in an
online fashion. The firm needs to price these products to sell them to its customer base …

Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes

C Ye, W **ong, Q Gu, T Zhang - International Conference on …, 2023 - proceedings.mlr.press
Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …

Corruption-robust exploration in episodic reinforcement learning

T Lykouris, M Simchowitz… - … on Learning Theory, 2021 - proceedings.mlr.press
We initiate the study of episodic reinforcement learning under adversarial corruptions in both
the rewards and the transition probabilities of the underlying system extending recent results …