Bilinear classes: A structural framework for provable generalization in rl
Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …
generalization in reinforcement learning in a wide variety of settings through the use of …
Neural contextual bandits with ucb-based exploration
We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …
unknown function with additive noise. No assumption is made about the reward function …
Beyond ucb: Optimal and efficient contextual bandits with regression oracles
A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …
algorithms with computational requirements no worse than classical supervised learning …
Improved algorithms for linear stochastic bandits
We improve the theoretical analysis and empirical performance of algorithms for the
stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem …
stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem …
Contextual bandits with linear payoff functions
In this paper we study the contextual bandit problem (also known as the multi-armed bandit
problem with expert advice) for linear payoff functions. For $ T $ rounds, $ K $ actions, and d …
problem with expert advice) for linear payoff functions. For $ T $ rounds, $ K $ actions, and d …
A contextual-bandit approach to personalized news article recommendation
Personalized web services strive to adapt their services (advertisements, news articles, etc.)
to individual users by making use of both content and user information. Despite a few recent …
to individual users by making use of both content and user information. Despite a few recent …
Contextual gaussian process bandit optimization
How should we design experiments to maximize performance of a complex system, taking
into account uncontrollable environmental conditions? How should we select relevant …
into account uncontrollable environmental conditions? How should we select relevant …
Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability
We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …
Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms
Contextual bandit algorithms have become popular for online recommendation systems
such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the …
such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the …
From ads to interventions: Contextual bandits in mobile health
The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …