Bilinear classes: A structural framework for provable generalization in rl

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press
Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International conference on machine …, 2020 - proceedings.mlr.press
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Improved algorithms for linear stochastic bandits

Y Abbasi-Yadkori, D Pál… - Advances in neural …, 2011 - proceedings.neurips.cc
We improve the theoretical analysis and empirical performance of algorithms for the
stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem …

Contextual bandits with linear payoff functions

W Chu, L Li, L Reyzin… - Proceedings of the …, 2011 - proceedings.mlr.press
In this paper we study the contextual bandit problem (also known as the multi-armed bandit
problem with expert advice) for linear payoff functions. For $ T $ rounds, $ K $ actions, and d …

A contextual-bandit approach to personalized news article recommendation

L Li, W Chu, J Langford, RE Schapire - Proceedings of the 19th …, 2010 - dl.acm.org
Personalized web services strive to adapt their services (advertisements, news articles, etc.)
to individual users by making use of both content and user information. Despite a few recent …

Contextual gaussian process bandit optimization

A Krause, C Ong - Advances in neural information …, 2011 - proceedings.neurips.cc
How should we design experiments to maximize performance of a complex system, taking
into account uncontrollable environmental conditions? How should we select relevant …

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

L Li, W Chu, J Langford, X Wang - … conference on Web search and data …, 2011 - dl.acm.org
Contextual bandit algorithms have become popular for online recommendation systems
such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the …

From ads to interventions: Contextual bandits in mobile health

A Tewari, SA Murphy - Mobile health: sensors, analytic methods, and …, 2017 - Springer
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …