[書籍][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Provable self-play algorithms for competitive reinforcement learning
Self-play, where the algorithm learns by playing against itself without requiring any direct
supervision, has become the new weapon in modern Reinforcement Learning (RL) for …
supervision, has become the new weapon in modern Reinforcement Learning (RL) for …
Stochastic multi-armed-bandit problem with non-stationary rewards
In a multi-armed bandit (MAB) problem a gambler needs to choose at each round of play
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
one of K arms, each characterized by an unknown reward distribution. Reward realizations …
Stochastic bandits robust to adversarial corruptions
We introduce a new model of stochastic bandits with adversarial corruptions which aims to
capture settings where most of the input follows a stochastic pattern but some fraction of it …
capture settings where most of the input follows a stochastic pattern but some fraction of it …
Adapting to misspecification in contextual bandits
A major research direction in contextual bandits is to develop algorithms that are
computationally efficient, yet support flexible, general-purpose function approximation …
computationally efficient, yet support flexible, general-purpose function approximation …
Control strategies for physically simulated characters performing two-player competitive sports
In two-player competitive sports, such as boxing and fencing, athletes often demonstrate
efficient and tactical movements during a competition. In this paper, we develop a learning …
efficient and tactical movements during a competition. In this paper, we develop a learning …
Boltzmann exploration done right
N Cesa-Bianchi, C Gentile… - Advances in neural …, 2017 - proceedings.neurips.cc
Boltzmann exploration is a classic strategy for sequential decision-making under
uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite …
uncertainty, and is one of the most standard tools in Reinforcement Learning (RL). Despite …
Learning in repeated auctions with budgets: Regret minimization and equilibrium
SR Balseiro, Y Gur - Management Science, 2019 - pubsonline.informs.org
In online advertising markets, advertisers often purchase ad placements through bidding in
repeated auctions based on realized viewer information. We study how budget-constrained …
repeated auctions based on realized viewer information. We study how budget-constrained …
Better algorithms for stochastic bandits with adversarial corruptions
We study the stochastic multi-armed bandits problem in the presence of adversarial
corruption. We present a new algorithm for this problem whose regret is nearly optimal …
corruption. We present a new algorithm for this problem whose regret is nearly optimal …