Академия Google

Сохранить Цитировать Цитируется: 206 Похожие статьи Все версии статьи (12) В виде HTML

Explore no more: Improved high-probability regret bounds for non-stochastic bandits

G Neu - Advances in Neural Information Processing …, 2015 - proceedings.neurips.cc

This work addresses the problem of regret minimization in non-stochastic multi-armed bandit
problems, focusing on performance guarantees that hold with high probability. Such results …

Сохранить Цитировать Цитируется: 14 Похожие статьи Все версии статьи (7) В виде HTML

Unconstrained online learning with unbounded losses

A Jacobsen, A Cutkosky - International Conference on …, 2023 - proceedings.mlr.press

Algorithms for online learning typically require one or more boundedness assumptions: that
the domain is bounded, that the losses are Lipschitz, or both. In this paper, we develop a …

Сохранить Цитировать Цитируется: 22 Похожие статьи Все версии статьи (5) В виде HTML

On the convergence of no-regret learning dynamics in time-varying games

I Anagnostides, I Panageas… - Advances in Neural …, 2023 - proceedings.neurips.cc

Most of the literature on learning in games has focused on the restrictive setting where the
underlying repeated game does not change over time. Much less is known about the …

Сохранить Цитировать Цитируется: 168 Похожие статьи Все версии статьи (7) В виде HTML

Achieving all with no parameters: Adanormalhedge

H Luo, RE Schapire - Conference on Learning Theory, 2015 - proceedings.mlr.press

We study the classic online learning problem of predicting with expert advice, and propose a
truly parameter-free and adaptive algorithm that achieves several objectives simultaneously …

Сохранить Цитировать Цитируется: 139 Похожие статьи Все версии статьи (20) В виде HTML

Improved dynamic regret for non-degenerate functions

L Zhang, T Yang, J Yi, R **… - Advances in Neural …, 2017 - proceedings.neurips.cc

Recently, there has been a growing research interest in the analysis of dynamic regret,
which measures the performance of an online learner against a sequence of local …

Сохранить Цитировать Цитируется: 120 Похожие статьи Все версии статьи (12) В виде HTML

Dynamic regret of strongly adaptive methods

L Zhang, T Yang, ZH Zhou - International conference on …, 2018 - proceedings.mlr.press

To cope with changing environments, recent developments in online learning have
introduced the concepts of adaptive regret and dynamic regret independently. In this paper …

Сохранить Цитировать Цитируется: 6 Похожие статьи Все версии статьи (5) В виде HTML

Dynamic regret of adversarial linear mixture MDPs

LF Li, P Zhao, ZH Zhou - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We study reinforcement learning in episodic inhomogeneous MDPs with adversarial full-
information rewards and the unknown transition kernel. We consider the linear mixture …

Сохранить Цитировать Цитируется: 48 Похожие статьи Все версии статьи (5) В виде HTML

[PDF] jmlr.org

Adaptivity and non-stationarity: Problem-dependent dynamic regret for online convex optimization

P Zhao, YJ Zhang, L Zhang, ZH Zhou - Journal of Machine Learning …, 2024 - jmlr.org

We investigate online convex optimization in non-stationary environments and choose
dynamic regret as the performance measure, defined as the difference between cumulative …