Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
A unified framework for stochastic optimization
WB Powell - European journal of operational research, 2019 - Elsevier
Stochastic optimization is an umbrella term that includes over a dozen fragmented
communities, using a patchwork of sometimes overlap** notational systems with …
communities, using a patchwork of sometimes overlap** notational systems with …
The statistical complexity of interactive decision making
A fundamental challenge in interactive learning and decision making, ranging from bandit
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
problems to reinforcement learning, is to provide sample-efficient, adaptive learning …
Machine learning testing: Survey, landscapes and horizons
This paper provides a comprehensive survey of techniques for testing machine learning
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
systems; Machine Learning Testing (ML testing) research. It covers 144 papers on testing …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Hyperband: A novel bandit-based approach to hyperparameter optimization
Performance of machine learning algorithms depends critically on identifying a good set of
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …
hyperparameters. While recent approaches use Bayesian optimization to adaptively select …
Reward-free exploration for reinforcement learning
Exploration is widely regarded as one of the most challenging aspects of reinforcement
learning (RL), with many naive approaches succumbing to exponential sample complexity …
learning (RL), with many naive approaches succumbing to exponential sample complexity …
Time-uniform, nonparametric, nonasymptotic confidence sequences
A confidence sequence is a sequence of confidence intervals that is uniformly valid over an
unbounded time horizon. Our work develops confidence sequences whose widths go to …
unbounded time horizon. Our work develops confidence sequences whose widths go to …
Non-stochastic best arm identification and hyperparameter optimization
Motivated by the task of hyperparameter optimization, we introduce the\em non-stochastic
best-arm identification problem. We identify an attractive algorithm for this setting that makes …
best-arm identification problem. We identify an attractive algorithm for this setting that makes …
Optimal best arm identification with fixed confidence
We give a complete characterization of the complexity of best-arm identification in one-
parameter bandit problems. We prove a new, tight lower bound on the sample complexity …
parameter bandit problems. We prove a new, tight lower bound on the sample complexity …
Leveraging offline data in online reinforcement learning
Two central paradigms have emerged in the reinforcement learning (RL) community: online
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …
RL and offline RL. In the online RL setting, the agent has no prior knowledge of the …