Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

Nearly optimal algorithms for linear contextual bandits with adversarial corruptions

J He, D Zhou, T Zhang, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc
We study the linear contextual bandit problem in the presence of adversarial corruption,
where the reward at each round is corrupted by an adversary, and the corruption level (ie …

Proportional response: Contextual bandits for simple and cumulative regret minimization

SK Krishnamurthy, R Zhan, S Athey… - Advances in Neural …, 2023 - proceedings.neurips.cc
In many applications, eg in healthcare and e-commerce, the goal of a contextual bandit may
be to learn an optimal treatment assignment policy at the end of the experiment. That is, to …

Metadata-based multi-task bandits with bayesian hierarchical models

R Wan, L Ge, R Song - Advances in Neural Information …, 2021 - proceedings.neurips.cc
How to explore efficiently is a central problem in multi-armed bandits. In this paper, we
introduce the metadata-based multi-task bandit problem, where the agent needs to solve a …

Provable model-based nonlinear bandit and reinforcement learning: Shelve optimism, embrace virtual curvature

K Dong, J Yang, T Ma - Advances in Neural Information …, 2021 - proceedings.neurips.cc
This paper studies model-based bandit and reinforcement learning (RL) with nonlinear
function approximations. We propose to study convergence to approximate local maxima …

Contextual bandits in a survey experiment on charitable giving: Within-experiment outcomes versus policy learning

S Athey, U Byambadalai, V Hadad… - arxiv preprint arxiv …, 2022 - arxiv.org
We design and implement an adaptive experiment (a``contextual bandit'') to learn a targeted
treatment assignment policy, where the goal is to use a participant's survey responses to …

Corralling a larger band of bandits: A case study on switching regret for linear bandits

H Luo, M Zhang, P Zhao… - Conference on Learning …, 2022 - proceedings.mlr.press
We consider the problem of combining and learning over a set of adversarial bandit
algorithms with the goal of adaptively tracking the best one on the fly. The Corral algorithm of …

Flexible and efficient contextual bandits with heterogeneous treatment effect oracles

AG Carranza, SK Krishnamurthy… - … Conference on Artificial …, 2023 - proceedings.mlr.press
Contextual bandit algorithms often estimate reward models to inform decision-making.
However, true rewards can contain action-independent redundancies that are not relevant …

The fragility of optimized bandit algorithms

L Fan, PW Glynn - Operations Research, 2024 - pubsonline.informs.org
Much of the literature on optimal design of bandit algorithms is based on minimization of
expected regret. It is well known that algorithms that are optimal over certain exponential …

Best-of-three-worlds linear bandit algorithm with variance-adaptive regret bounds

S Ito, K Takemura - The Thirty Sixth Annual Conference on …, 2023 - proceedings.mlr.press
This paper proposes a linear bandit algorithm that is adaptive to environments at two
different levels of hierarchy. At the higher level, the proposed algorithm adapts to a variety of …