No-regret learning in time-varying zero-sum games
Learning from repeated play in a fixed two-player zero-sum game is a classic problem in
game theory and online learning. We consider a variant of this problem where the game …
game theory and online learning. We consider a variant of this problem where the game …
Explore no more: Improved high-probability regret bounds for non-stochastic bandits
G Neu - Advances in Neural Information Processing …, 2015 - proceedings.neurips.cc
This work addresses the problem of regret minimization in non-stochastic multi-armed bandit
problems, focusing on performance guarantees that hold with high probability. Such results …
problems, focusing on performance guarantees that hold with high probability. Such results …
Unconstrained online learning with unbounded losses
Algorithms for online learning typically require one or more boundedness assumptions: that
the domain is bounded, that the losses are Lipschitz, or both. In this paper, we develop a …
the domain is bounded, that the losses are Lipschitz, or both. In this paper, we develop a …
On the convergence of no-regret learning dynamics in time-varying games
Most of the literature on learning in games has focused on the restrictive setting where the
underlying repeated game does not change over time. Much less is known about the …
underlying repeated game does not change over time. Much less is known about the …
Achieving all with no parameters: Adanormalhedge
We study the classic online learning problem of predicting with expert advice, and propose a
truly parameter-free and adaptive algorithm that achieves several objectives simultaneously …
truly parameter-free and adaptive algorithm that achieves several objectives simultaneously …
Improved dynamic regret for non-degenerate functions
Recently, there has been a growing research interest in the analysis of dynamic regret,
which measures the performance of an online learner against a sequence of local …
which measures the performance of an online learner against a sequence of local …
Dynamic regret of strongly adaptive methods
To cope with changing environments, recent developments in online learning have
introduced the concepts of adaptive regret and dynamic regret independently. In this paper …
introduced the concepts of adaptive regret and dynamic regret independently. In this paper …
Dynamic regret of adversarial linear mixture MDPs
We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …
information rewards and the unknown transition kernel. We consider the linear mixture …
Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization
We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …
dynamic regret as the performance measure, defined as the difference between cumulative …
Impossible tuning made possible: A new expert algorithm and its applications
We resolve the long-standing" impossible tuning" issue for the classic expert problem and
show that, it is in fact possible to achieve regret $ O\left (\sqrt {(\ln d)\sum_t\ell_ {t, i}^ 2}\right) …
show that, it is in fact possible to achieve regret $ O\left (\sqrt {(\ln d)\sum_t\ell_ {t, i}^ 2}\right) …