Študovňa Google

WB Powell - European journal of operational research, 2019 - Elsevier

Stochastic optimization is an umbrella term that includes over a dozen fragmented
communities, using a patchwork of sometimes overlap** notational systems with …

Uložiť Citovať Citované 366-krát Súvisiace články Všetky verzie 9

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

The statistical complexity of interactive decision making

DJ Foster, SM Kakade, J Qian, A Rakhlin - arxiv preprint arxiv:2112.13487, 2021 - arxiv.org

A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …

Uložiť Citovať Citované 220-krát Súvisiace články Všetky verzie 5 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Machine learning testing: Survey, landscapes and horizons

JM Zhang, M Harman, L Ma… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org

This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …

Uložiť Citovať Citované 1007-krát Súvisiace články Všetky verzie 14

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Uložiť Citovať Citované 1272-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Hyperband: A novel bandit-based approach to hyperparameter optimization

L Li, K Jamieson, G DeSalvo, A Rostamizadeh… - Journal of Machine …, 2018 - jmlr.org

Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …

Uložiť Citovať Citované 3180-krát Súvisiace články Všetky verzie 11 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Reward-free exploration for reinforcement learning

C **, A Krishnamurthy… - … on Machine Learning, 2020 - proceedings.mlr.press

Exploration is widely regarded as one of the most challenging aspects of reinforcement
learning (RL), with many naive approaches succumbing to exponential sample complexity …

Uložiť Citovať Citované 273-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] projecteuclid.org

Time-uniform, nonparametric, nonasymptotic confidence sequences

SR Howard, A Ramdas, J McAuliffe, J Sekhon - The Annals of Statistics, 2021 - JSTOR

A confidence sequence is a sequence of confidence intervals that is uniformly valid over an
unbounded time horizon. Our work develops confidence sequences whose widths go to …

Uložiť Citovať Citované 327-krát Súvisiace články Všetky verzie 9

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Non-stochastic best arm identification and hyperparameter optimization

K Jamieson, A Talwalkar - Artificial intelligence and statistics, 2016 - proceedings.mlr.press

Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …

Uložiť Citovať Citované 786-krát Súvisiace články Všetky verzie 8 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Optimal best arm identification with fixed confidence

A Garivier, E Kaufmann - Conference on Learning Theory, 2016 - proceedings.mlr.press

We give a complete characterization of the complexity of best-arm identification in one-
parameter bandit problems. We prove a new, tight lower bound on the sample complexity …

Uložiť Citovať Citované 437-krát Súvisiace články Všetky verzie 15 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Leveraging offline data in online reinforcement learning

A Wagenmaker, A Pacchiano - International Conference on …, 2023 - proceedings.mlr.press

Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …

Uložiť Citovať Citované 47-krát Súvisiace články Všetky verzie 7 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

On the complexity of best-arm identification in multi-armed bandit models

A unified framework for stochastic optimization

The statistical complexity of interactive decision making

Machine learning testing: Survey, landscapes and horizons

Introduction to multi-armed bandits

Hyperband: A novel bandit-based approach to hyperparameter optimization

Reward-free exploration for reinforcement learning

Time-uniform, nonparametric, nonasymptotic confidence sequences

Non-stochastic best arm identification and hyperparameter optimization

Optimal best arm identification with fixed confidence

Leveraging offline data in online reinforcement learning