Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Competitive caching with machine learned advice
Traditional online algorithms encapsulate decision making under uncertainty, and give ways
to hedge against all possible future events, while guaranteeing a nearly optimal solution, as …
to hedge against all possible future events, while guaranteeing a nearly optimal solution, as …
Dual mirror descent for online allocation problems
We consider online allocation problems with concave revenue functions and resource
constraints, which are central problems in revenue management and online advertising. In …
constraints, which are central problems in revenue management and online advertising. In …
Corruption-robust offline reinforcement learning with general function approximation
We investigate the problem of corruption robustness in offline reinforcement learning (RL)
with general function approximation, where an adversary can corrupt each sample in the …
with general function approximation, where an adversary can corrupt each sample in the …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …
Better algorithms for stochastic bandits with adversarial corruptions
We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …
corruption. We present a new algorithm for this problem whose regret is nearly optimal …
Nearly optimal algorithms for linear contextual bandits with adversarial corruptions
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
Feature-based dynamic pricing
We consider the problem faced by a firm that receives highly differentiated products in an
online fashion. The firm needs to price these products to sell them to its customer base …
online fashion. The firm needs to price these products to sell them to its customer base …
Corruption-robust algorithms with uncertainty weighting for nonlinear contextual bandits and markov decision processes
Despite the significant interest and progress in reinforcement learning (RL) problems with
adversarial corruption, current works are either confined to the linear setting or lead to an …
adversarial corruption, current works are either confined to the linear setting or lead to an …
Corruption-robust exploration in episodic reinforcement learning
We initiate the study of episodic reinforcement learning under adversarial corruptions in both
the rewards and the transition probabilities of the underlying system extending recent results …
the rewards and the transition probabilities of the underlying system extending recent results …