Model selection in contextual stochastic bandit problems

A Pacchiano, M Phan… - Advances in …, 2020 - proceedings.neurips.cc
We study bandit model selection in stochastic environments. Our approach relies on a
master algorithm that selects between candidate base algorithms. We develop a master …

Learning personalized decision support policies

U Bhatt, V Chen, KM Collins, P Kamalaruban… - arxiv preprint arxiv …, 2023 - arxiv.org
Individual human decision-makers may benefit from different forms of support to improve
decision outcomes, but when each form of support will yield better outcomes? In this work …

Tracking most significant shifts in nonparametric contextual bandits

J Suk, S Kpotufe - Advances in Neural Information …, 2023 - proceedings.neurips.cc
We study nonparametric contextual bandits where Lipschitz mean reward functions may
change over time. We first establish the minimax dynamic regret rate in this less understood …

Dynamic contextual pricing with doubly non-parametric random utility models

E Chen, X Chen, L Gao, J Li - arxiv preprint arxiv:2405.06866, 2024 - arxiv.org
In the evolving landscape of digital commerce, adaptive dynamic pricing strategies are
essential for gaining a competitive edge. This paper introduces novel {\em doubly …

Unifying offline causal inference and online bandit learning for data driven decision

Y Li, H **e, Y Lin, JCS Lui - Proceedings of the Web Conference 2021, 2021 - dl.acm.org
A fundamental question for companies with large amount of logged data is: How to use such
logged data together with incoming streaming data to make good decisions? Many …

The role of contextual information in best arm identification

M Kato, K Ariu - arxiv preprint arxiv:2106.14077, 2021 - arxiv.org
We study the best-arm identification problem with fixed confidence when contextual
(covariate) information is available in stochastic bandits. Although we can use contextual …

Adversarial rewards in universal learning for contextual bandits

M Blanchard, S Hanneke, P Jaillet - arxiv preprint arxiv:2302.07186, 2023 - arxiv.org
We study the fundamental limits of learning in contextual bandits, where a learner's rewards
depend on their actions and a known context, which extends the canonical multi-armed …

Adaptive algorithm for multi-armed bandit problem with high-dimensional covariates

W Qian, CK Ing, J Liu - Journal of the American Statistical …, 2024 - Taylor & Francis
This article studies an important sequential decision making problem known as the multi-
armed stochastic bandit problem with covariates. Under a linear bandit framework with high …

Thompson sampling in partially observable contextual bandits

H Park, MKS Faradonbeh - arxiv preprint arxiv:2402.10289, 2024 - arxiv.org
Contextual bandits constitute a classical framework for decision-making under uncertainty.
In this setting, the goal is to learn the arms of highest reward subject to contextual …

Self-tuning bandits over unknown covariate-shifts

J Suk, S Kpotufe - Algorithmic Learning Theory, 2021 - proceedings.mlr.press
Bandits with covariates, aka\emph {contextual bandits}, address situations where optimal
actions (or arms) at a given time $ t $, depend on a\emph {context} $ x_t $, eg, a new …