Online learning: A comprehensive survey
Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …
to tackle some predictive (or any type of decision-making) task by learning from a sequence …
A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
[PDF][PDF] International conference on machine learning
W Li, C Wang, G Cheng, Q Song - Transactions on machine learning …, 2023 - par.nsf.gov
In this paper, we make the key delineation on the roles of resolution and statistical
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
[КНИГА][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Derivative-free optimization methods
In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …
applications, objective and constraint functions are available only as the output of a black …
Minimax regret bounds for reinforcement learning
We consider the problem of provably optimal exploration in reinforcement learning for finite
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …
Hyperband: A novel bandit-based approach to hyperparameter optimization
Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …
Taking the human out of the loop: A review of Bayesian optimization
Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …
users, massive complex software systems, and large-scale heterogeneous computing and …
Neural contextual bandits with ucb-based exploration
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …
unknown function with additive noise. No assumption is made about the reward function …