Google Наука

S Du, S Kakade, J Lee, S Lovett… - International …, 2021 - proceedings.mlr.press

Abstract This work introduces Bilinear Classes, a new structural framework, which permit
generalization in reinforcement learning in a wide variety of settings through the use of …

Запазване Позоваване С позовавания в 251 Сродни статии Всички 8 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

Запазване Позоваване С позовавания в 306 Сродни статии Всички 11 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

D Foster, A Rakhlin - International conference on machine …, 2020 - proceedings.mlr.press

A fundamental challenge in contextual bandits is to develop flexible, general-purpose
algorithms with computational requirements no worse than classical supervised learning …

Запазване Позоваване С позовавания в 245 Сродни статии Всички 6 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Improved algorithms for linear stochastic bandits

Y Abbasi-Yadkori, D Pál… - Advances in neural …, 2011 - proceedings.neurips.cc

We improve the theoretical analysis and empirical performance of algorithms for the
stochastic multi-armed bandit problem and the linear stochastic multi-armed bandit problem …

Запазване Позоваване С позовавания в 2203 Сродни статии Всички 17 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] mlr.press

Contextual bandits with linear payoff functions

W Chu, L Li, L Reyzin… - Proceedings of the …, 2011 - proceedings.mlr.press

In this paper we study the contextual bandit problem (also known as the multi-armed bandit
problem with expert advice) for linear payoff functions. For $ T $ rounds, $ K $ actions, and d …

Запазване Позоваване С позовавания в 1374 Сродни статии Всички 12 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

A contextual-bandit approach to personalized news article recommendation

L Li, W Chu, J Langford, RE Schapire - Proceedings of the 19th …, 2010 - dl.acm.org

Personalized web services strive to adapt their services (advertisements, news articles, etc.)
to individual users by making use of both content and user information. Despite a few recent …

Запазване Позоваване С позовавания в 3635 Сродни статии Всички 22 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] neurips.cc

Contextual gaussian process bandit optimization

A Krause, C Ong - Advances in neural information …, 2011 - proceedings.neurips.cc

How should we design experiments to maximize performance of a complex system, taking
into account uncontrollable environmental conditions? How should we select relevant …

Запазване Позоваване С позовавания в 515 Сродни статии Всички 19 версии Във вид на HTML

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

D Simchi-Levi, Y Xu - Mathematics of Operations Research, 2022 - pubsonline.informs.org

We consider the general (stochastic) contextual bandit problem under the realizability
assumption, that is, the expected reward, as a function of contexts and actions, belongs to a …

Запазване Позоваване С позовавания в 140 Сродни статии Всички 10 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] arxiv.org

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

L Li, W Chu, J Langford, X Wang - … conference on Web search and data …, 2011 - dl.acm.org

Contextual bandit algorithms have become popular for online recommendation systems
such as Digg, Yahoo! Buzz, and news recommendation in general. Offline evaluation of the …

Запазване Позоваване С позовавания в 705 Сродни статии Всички 9 версии

[免费ChatGPT] [DeepSeek可用网址] [PDF] ambujtewari.com

From ads to interventions: Contextual bandits in mobile health

A Tewari, SA Murphy - Mobile health: sensors, analytic methods, and …, 2017 - Springer

The first paper on contextual bandits was written by Michael Woodroofe in 1979 (Journal of
the American Statistical Association, 74 (368), 799–806, 1979) but the term “contextual …

Запазване Позоваване С позовавания в 251 Сродни статии Всички 5 версии

Създаване на сигнал

Позоваване

Разширено търсене

Запазено в „Моята библиотека“

Reinforcement learning with immediate rewards and linear hypotheses

Bilinear classes: A structural framework for provable generalization in rl

Neural contextual bandits with ucb-based exploration

Beyond ucb: Optimal and efficient contextual bandits with regression oracles

Improved algorithms for linear stochastic bandits

Contextual bandits with linear payoff functions

A contextual-bandit approach to personalized news article recommendation

Contextual gaussian process bandit optimization

Bypassing the monster: A faster and simpler optimal algorithm for contextual bandits under realizability

Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms

From ads to interventions: Contextual bandits in mobile health