Online decision making with high-dimensional covariates
Big data have enabled decision makers to tailor decisions at the individual level in a variety
of domains, such as personalized medicine and online advertising. Doing so involves …
of domains, such as personalized medicine and online advertising. Doing so involves …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Balanced linear contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …
Estimation considerations in contextual bandits
Contextual bandit algorithms are sensitive to the estimation method of the outcome model as
well as the exploration method used, particularly in the presence of rich heterogeneity or …
well as the exploration method used, particularly in the presence of rich heterogeneity or …
Offline multi-action policy learning: Generalization and optimization
In many settings, a decision maker wishes to learn a rule, or policy, that maps from
observable characteristics of an individual to an action. Examples include selecting offers …
observable characteristics of an individual to an action. Examples include selecting offers …
Contextual bandits with similarity information
A Slivkins - Proceedings of the 24th annual Conference On …, 2011 - proceedings.mlr.press
In a multi-armed bandit (MAB) problem, an online algorithm makes a sequence of choices.
In each round it chooses from a time-invariant set of alternatives and receives the payoff …
In each round it chooses from a time-invariant set of alternatives and receives the payoff …
From ads to interventions: Contextual bandits in mobile health
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …
Instance-dependent complexity of contextual bandits and reinforcement learning: A disagreement-based perspective
In the classical multi-armed bandit problem, instance-dependent algorithms attain improved
performance on" easy" problems with a gap between the best and second-best arm. Are …
performance on" easy" problems with a gap between the best and second-best arm. Are …
Contextual bandits for adapting treatment in a mouse model of de novo carcinogenesis
A Durand, C Achilleos, D Iacovides… - Machine learning …, 2018 - proceedings.mlr.press
In this work, we present a specific case study where we aim to design effective treatment
allocation strategies and validate these using a mouse model of skin cancer. Collecting data …
allocation strategies and validate these using a mouse model of skin cancer. Collecting data …
Distributionally robust policy evaluation and learning in offline contextual bandits
Policy learning using historical observational data is an important problem that has found
widespread applications. However, existing literature rests on the crucial assumption that …
widespread applications. However, existing literature rests on the crucial assumption that …