Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Regret minimization with performative feedback
In performative prediction, the deployment of a predictive model triggers a shift in the data
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …
distribution. As these shifts are typically unknown ahead of time, the learner needs to deploy …
Robust lipschitz bandits to adversarial corruptions
Lipschitz bandit is a variant of stochastic bandits that deals with a continuous arm set
defined on a metric space, where the reward function is subject to a Lipschitz constraint. In …
defined on a metric space, where the reward function is subject to a Lipschitz constraint. In …
Multiobjective lipschitz bandits under lexicographic ordering
This paper studies the multiobjective bandit problem under lexicographic ordering, wherein
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …
the learner aims to simultaneously maximize $ m $ objectives hierarchically. The only …
Zeroth-order non-convex learning via hierarchical dual averaging
We propose a hierarchical version of dual averaging for zeroth-order online non-convex
optimization {–} ie, learning processes where, at each stage, the optimizer is facing an …
optimization {–} ie, learning processes where, at each stage, the optimizer is facing an …
Quantifying the Merits of Network-Assist Online Learning in Optimizing Network Protocols
Optimizing network protocols is crucial for improving application performance. Recent
research works use multi-armed bandit (MAB) online learning methods to address network …
research works use multi-armed bandit (MAB) online learning methods to address network …
Intelligent informative frequency band searching assisted by a dynamic bandit tree method for machine fault diagnosis
The fault informative frequency band searching is crucial to envelope analysis-based
machine fault diagnosis. Its success often depends on effective filters. However, existing …
machine fault diagnosis. Its success often depends on effective filters. However, existing …
Thompson sampling-based recursive block elimination for dynamic assignment under limited budget in pure-exploration
In this paper, we investigate Thompson sampling-based sequential block elimination
approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit …
approaches for dynamic assignment problems in a pure-exploration Multi-Armed Bandit …
New Perspectives in Online Contract Design
S Zuo - arxiv preprint arxiv:2403.07143, 2024 - arxiv.org
This work studies the repeated principal-agent problem from an online learning perspective.
The principal's goal is to learn the optimal contract that maximizes her utility through …
The principal's goal is to learn the optimal contract that maximizes her utility through …
[PDF][PDF] Online Defense Strategies for Reinforcement Learning Against Adaptive Reward Poisoning
We consider the problem of defense against reward-poisoning attacks in reinforcement
learning and formulate it as a game in T rounds between a defender and an adaptive …
learning and formulate it as a game in T rounds between a defender and an adaptive …