Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
Nearly optimal algorithms for linear contextual bandits with adversarial corruptions
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
where the reward at each round is corrupted by an adversary, and the corruption level (ie …
Proportional response: Contextual bandits for simple and cumulative regret minimization
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …
Metadata-based multi-task bandits with bayesian hierarchical models
How to explore efficiently is a central problem in multi-armed bandits. In this paper, we
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …
Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …
function approximations. We propose to study convergence to approximate local maxima …
Contextual bandits in a survey experiment on charitable giving: Within-experiment outcomes versus policy learning
We design and implement an adaptive experiment (a``contextual bandit'') to learn a targeted
treatment assignment policy, where the goal is to use a participant's survey responses to …
treatment assignment policy, where the goal is to use a participant's survey responses to …
Corralling a larger band of bandits: A case study on switching regret for linear bandits
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …
Flexible and efficient contextual bandits with heterogeneous treatment effect oracles
Contextual bandit algorithms often estimate reward models to inform decision-making.
However, true rewards can contain action-independent redundancies that are not relevant …
However, true rewards can contain action-independent redundancies that are not relevant …
The fragility of optimized bandit algorithms
L Fan, PW Glynn - Operations Research, 2024 - pubsonline.informs.org
Much of the literature on optimal design of bandit algorithms is based on minimization of
expected regret. It is well known that algorithms that are optimal over certain exponential …
expected regret. It is well known that algorithms that are optimal over certain exponential …
Best-of-three-worlds linear bandit algorithm with variance-adaptive regret bounds
S Ito, K Takemura - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
This paper proposes a linear bandit algorithm that is adaptive to environments at two
different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of …
different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of …