A modern introduction to online learning

F Orabona - arxiv preprint arxiv:1912.13213, 2019 - arxiv.org
In this monograph, I introduce the basic concepts of Online Learning through a modern view
of Online Convex Optimization. Here, online learning refers to the framework of regret …

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com
Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Forecasting electricity consumption by aggregating specialized experts: A review of the sequential aggregation of specialized experts, with an application to Slovakian …

M Devaine, P Gaillard, Y Goude, G Stoltz - Machine Learning, 2013 - Springer
We consider the setting of sequential prediction of arbitrary sequences based on specialized
experts. We first provide a review of the relevant literature and present two theoretical …

[PDF][PDF] Adaptive subgradient methods for online learning and stochastic optimization.

J Duchi, E Hazan, Y Singer - Journal of machine learning research, 2011 - jmlr.org
We present a new family of subgradient methods that dynamically incorporate knowledge of
the geometry of the data observed in earlier iterations to perform more informative gradient …

[LIBRO][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

The multiplicative weights update method: a meta-algorithm and applications

S Arora, E Hazan, S Kale - Theory of computing, 2012 - theoryofcomputing.org
Algorithms in varied fields use the idea of maintaining a distribution over a certain set and
use the multiplicative update rule to iteratively change these weights. Their analyses are …

Online learning with predictable sequences

A Rakhlin, K Sridharan - Conference on Learning Theory, 2013 - proceedings.mlr.press
We present methods for online linear optimization that take advantage of benign (as
opposed to worst-case) sequences. Specifically if the sequence encountered by the learner …

Learning in games: a systematic review

RJ Qin, Y Yu - Science China Information Sciences, 2024 - Springer
Game theory studies the mathematical models for self-interested individuals. Nash
equilibrium is arguably the most central solution in game theory. While finding the Nash …

The best of both worlds: Stochastic and adversarial bandits

S Bubeck, A Slivkins - Conference on Learning Theory, 2012 - proceedings.mlr.press
We present a new bandit algorithm, SAO (Stochastic and Adversarial Optimal) whose regret
is (essentially) optimal both for adversarial rewards and for stochastic rewards. Specifically …

Online optimization with gradual variations

CK Chiang, T Yang, CJ Lee… - … on Learning Theory, 2012 - proceedings.mlr.press
We study the online convex optimization problem, in which an online algorithm has to make
repeated decisions with convex loss functions and hopes to achieve a small regret. We …