[書籍][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Provable self-play algorithms for competitive reinforcement learning

Y Bai, C ** - International conference on machine learning, 2020 - proceedings.mlr.press
Self-play, where the algorithm learns by playing against itself without requiring any direct
supervision, has become the new weapon in modern Reinforcement Learning (RL) for …

Stochastic multi-armed-bandit problem with non-stationary rewards

O Besbes, Y Gur, A Zeevi - Advances in neural information …, 2014 - proceedings.neurips.cc
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play
one of K arms, each characterized by an unknown reward distribution. Reward realizations …

Stochastic bandits robust to adversarial corruptions

T Lykouris, V Mirrokni, R Paes Leme - … of the 50th Annual ACM SIGACT …, 2018 - dl.acm.org
We introduce a new model of stochastic bandits with adversarial corruptions which aims to
capture settings where most of the input follows a stochastic pattern but some fraction of it …

Adapting to misspecification in contextual bandits

DJ Foster, C Gentile, M Mohri… - Advances in Neural …, 2020 - proceedings.neurips.cc
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …

Control strategies for physically simulated characters performing two-player competitive sports

J Won, D Gopinath, J Hodgins - ACM Transactions on Graphics (TOG), 2021 - dl.acm.org
In two-player competitive sports, such as boxing and fencing, athletes often demonstrate
efficient and tactical movements during a competition. In this paper, we develop a learning …

Boltzmann exploration done right

N Cesa-Bianchi, C Gentile… - Advances in neural …, 2017 - proceedings.neurips.cc
Boltzmann exploration is a classic strategy for sequential decision-making under
uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite …

Learning in repeated auctions with budgets: Regret minimization and equilibrium

SR Balseiro, Y Gur - Management Science, 2019 - pubsonline.informs.org
In online advertising markets, advertisers often purchase ad placements through bidding in
repeated auctions based on realized viewer information. We study how budget-constrained …

Better algorithms for stochastic bandits with adversarial corruptions

A Gupta, T Koren, K Talwar - Conference on Learning …, 2019 - proceedings.mlr.press
We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …