Online learning: A comprehensive survey

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier
Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

[PDF][PDF] International conference on machine learning

W Li, C Wang, G Cheng, Q Song - Transactions on machine learning …, 2023 - par.nsf.gov
In this paper, we make the key delineation on the roles of resolution and statistical
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

[КНИГА][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Derivative-free optimization methods

J Larson, M Menickelly, SM Wild - Acta Numerica, 2019 - cambridge.org
In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …

Minimax regret bounds for reinforcement learning

MG Azar, I Osband, R Munos - International conference on …, 2017 - proceedings.mlr.press
We consider the problem of provably optimal exploration in reinforcement learning for finite
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …

Hyperband: A novel bandit-based approach to hyperparameter optimization

L Li, K Jamieson, G DeSalvo, A Rostamizadeh… - Journal of Machine …, 2018 - jmlr.org
Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org
Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …