Академия Google

SCH Hoi, D Sahoo, J Lu, P Zhao - Neurocomputing, 2021 - Elsevier

Online learning represents a family of machine learning methods, where a learner attempts
to tackle some predictive (or any type of decision-making) task by learning from a sequence …

Сохранить Цитировать Цитируется: 898 Похожие статьи Все версии статьи (6)

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com

Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

Сохранить Цитировать Цитируется: 1296 Похожие статьи Все версии статьи (34) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

[PDF][PDF] International conference on machine learning

W Li, C Wang, G Cheng, Q Song - Transactions on machine learning …, 2023 - par.nsf.gov

In this paper, we make the key delineation on the roles of resolution and statistical
uncertainty in hierarchical bandits-based black-box optimization algorithms, guiding a more …

Сохранить Цитировать Цитируется: 1735 Похожие статьи В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Сохранить Цитировать Цитируется: 207 Похожие статьи Все версии статьи (6) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] tor-lattimore.com

[КНИГА][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Сохранить Цитировать Цитируется: 3299 Похожие статьи Все версии статьи (9) Поиск в библиотеках

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Derivative-free optimization methods

J Larson, M Menickelly, SM Wild - Acta Numerica, 2019 - cambridge.org

In many optimization problems arising from scientific, engineering and artificial intelligence
applications, objective and constraint functions are available only as the output of a black …

Сохранить Цитировать Цитируется: 513 Похожие статьи Все версии статьи (9)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Minimax regret bounds for reinforcement learning

MG Azar, I Osband, R Munos - International conference on …, 2017 - proceedings.mlr.press

We consider the problem of provably optimal exploration in reinforcement learning for finite
horizon MDPs. We show that an optimistic modification to value iteration achieves a regret …

Сохранить Цитировать Цитируется: 902 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Hyperband: A novel bandit-based approach to hyperparameter optimization

L Li, K Jamieson, G DeSalvo, A Rostamizadeh… - Journal of Machine …, 2018 - jmlr.org

Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …

Сохранить Цитировать Цитируется: 3173 Похожие статьи Все версии статьи (13) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] ieee.org

Taking the human out of the loop: A review of Bayesian optimization

B Shahriari, K Swersky, Z Wang… - Proceedings of the …, 2015 - ieeexplore.ieee.org

Big Data applications are typically associated with systems involving large numbers of
users, massive complex software systems, and large-scale heterogeneous computing and …

Сохранить Цитировать Цитируется: 6061 Похожие статьи Все версии статьи (14)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

Сохранить Цитировать Цитируется: 296 Похожие статьи Все версии статьи (10) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

X-Armed Bandits.

Online learning: A comprehensive survey

A tutorial on thompson sampling

[PDF][PDF] International conference on machine learning

The statistical complexity of interactive decision making

[КНИГА][B] Bandit algorithms

Derivative-free optimization methods

Minimax regret bounds for reinforcement learning

Hyperband: A novel bandit-based approach to hyperparameter optimization

Taking the human out of the loop: A review of Bayesian optimization

Neural contextual bandits with ucb-based exploration