[書籍][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

A modern introduction to online learning

F Orabona - arxiv preprint arxiv:1912.13213, 2019 - arxiv.org
In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Introduction to online convex optimization

E Hazan - Foundations and Trends® in Optimization, 2016 - nowpublishers.com
This monograph portrays optimization as a process. In many practical applications the
environment is so complex that it is infeasible to lay out a comprehensive theoretical model …

A survey of learning in multiagent environments: Dealing with non-stationarity

P Hernandez-Leal, M Kaisers, T Baarslag… - arxiv preprint arxiv …, 2017 - arxiv.org
The key challenge in multiagent learning is learning a best response to the behaviour of
other agents, which may be non-stationary: if the other agents adapt their strategy as well …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Online control with adversarial disturbances

N Agarwal, B Bullins, E Hazan… - International …, 2019 - proceedings.mlr.press
We study the control of linear dynamical systems with adversarial disturbances, as opposed
to statistical noise. We present an efficient algorithm that achieves nearly-tight regret bounds …

Online learning algorithms

N Cesa-Bianchi, F Orabona - Annual review of statistics and its …, 2021 - annualreviews.org
Online learning is a framework for the design and analysis of algorithms that build predictive
models by processing data one at the time. Besides being computationally efficient, online …

Logarithmic regret for online control

N Agarwal, E Hazan, K Singh - Advances in Neural …, 2019 - proceedings.neurips.cc
We study optimal regret bounds for control in linear dynamical systems under adversarially
changing strongly convex cost functions, given the knowledge of transition dynamics. This …

Rotting bandits

N Levine, K Crammer… - Advances in neural …, 2017 - proceedings.neurips.cc
Abstract The Multi-Armed Bandits (MAB) framework highlights the trade-off between
acquiring new knowledge (Exploration) and leveraging available knowledge (Exploitation) …