Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Contextual information-directed sampling
Abstract Information-directed sampling (IDS) has recently demonstrated its potential as a
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
data-efficient reinforcement learning algorithm. However, it is still unclear what is the right …
Leveraging demonstrations to improve online learning: Quality matters
We investigate the extent to which offline demonstration data can improve online learning. It
is natural to expect some improvement, but the question is how, and by how much? We …
is natural to expect some improvement, but the question is how, and by how much? We …
Information directed sampling for stochastic bandits with graph feedback
We consider stochastic multi-armed bandit problems with graph feedback, where the
decision maker is allowed to observe the neighboring actions of the chosen action. We allow …
decision maker is allowed to observe the neighboring actions of the chosen action. We allow …
Bandits with feedback graphs and switching costs
We study the adversarial multi-armed bandit problem where the learner is supplied with
partial observations modeled by a\emph {feedback graph} and where shifting to a new …
partial observations modeled by a\emph {feedback graph} and where shifting to a new …
Small-loss bounds for online learning with partial information
We consider the problem of adversarial (non-stochastic) online learning with partial
information feedback, where at each round, a decision maker selects an action from a finite …
information feedback, where at each round, a decision maker selects an action from a finite …
Satisficing in time-sensitive bandit learning
Much of the recent literature on bandit learning focuses on algorithms that aim to converge
on an optimal action. One shortcoming is that this orientation does not account for time …
on an optimal action. One shortcoming is that this orientation does not account for time …
Feedback graph regret bounds for Thompson sampling and UCB
We study the stochastic multi-armed bandit problem with the graph-based feedback
structure introduced by Mannor and Shamir. We analyze the performance of the two most …
structure introduced by Mannor and Shamir. We analyze the performance of the two most …
Simultaneously learning stochastic and adversarial bandits with general graph feedback
The problem of online learning with graph feedback has been extensively studied in the
literature due to its generality and potential to model various learning tasks. Existing works …
literature due to its generality and potential to model various learning tasks. Existing works …
Understanding bandits with graph feedback
The bandit problem with graph feedback, proposed in [Mannor and Shamir, NeurIPS 2011],
is modeled by a directed graph $ G=(V, E) $ where $ V $ is the collection of bandit arms, and …
is modeled by a directed graph $ G=(V, E) $ where $ V $ is the collection of bandit arms, and …
First-order bayesian regret analysis of thompson sampling
We address online combinatorial optimization when the player has a prior over the
adversary's sequence of losses. In this setting, Russo and Van Roy proposed an information …
adversary's sequence of losses. In this setting, Russo and Van Roy proposed an information …