Академия Google

AV Den Boer - Surveys in operations research and management …, 2015 - Elsevier

The topic of dynamic pricing and learning has received a considerable amount of attention
in recent years, from different scientific communities. We survey these literature streams: we …

Сохранить Цитировать Цитируется: 633 Похожие статьи Все версии статьи (13) Поиск в библиотеках

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Neural thompson sampling

W Zhang, D Zhou, L Li, Q Gu - arxiv preprint arxiv:2010.00827, 2020 - arxiv.org

Thompson Sampling (TS) is one of the most effective algorithms for solving contextual multi-
armed bandit problems. In this paper, we propose a new algorithm, called Neural Thompson …

Сохранить Цитировать Цитируется: 282 Похожие статьи Все версии статьи (8) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] tor-lattimore.com

[КНИГА][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Сохранить Цитировать Цитируется: 3299 Похожие статьи Все версии статьи (9) Поиск в библиотеках

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com

Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Сохранить Цитировать Цитируется: 1256 Похожие статьи Все версии статьи (7) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Neural contextual bandits with ucb-based exploration

D Zhou, L Li, Q Gu - International Conference on Machine …, 2020 - proceedings.mlr.press

We study the stochastic contextual bandit problem, where the reward is generated from an
unknown function with additive noise. No assumption is made about the reward function …

Сохранить Цитировать Цитируется: 296 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Weight uncertainty in neural network

C Blundell, J Cornebise… - … on machine learning, 2015 - proceedings.mlr.press

We introduce a new, efficient, principled and backpropagation-compatible algorithm for
learning a probability distribution on the weights of a neural network, called Bayes by …

Сохранить Цитировать Цитируется: 4473 Похожие статьи Все версии статьи (7) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

S Bubeck, N Cesa-Bianchi - Foundations and Trends® in …, 2012 - nowpublishers.com

Multi-armed bandit problems are the most basic examples of sequential decision problems
with an exploration-exploitation trade-off. This is the balance between staying with the option …

Сохранить Цитировать Цитируется: 3287 Похожие статьи Все версии статьи (26) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Provably optimal algorithms for generalized linear contextual bandits

L Li, Y Lu, D Zhou - International Conference on Machine …, 2017 - proceedings.mlr.press

Contextual bandits are widely used in Internet services from news recommendation to
advertising, and to Web search. Generalized linear models (logistical regression in …

Сохранить Цитировать Цитируется: 374 Похожие статьи Все версии статьи (5) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Thompson sampling for contextual bandits with linear payoffs

S Agrawal, N Goyal - International conference on machine …, 2013 - proceedings.mlr.press

Thompson Sampling is one of the oldest heuristics for multi-armed bandit problems. It is a
randomized algorithm based on Bayesian ideas, and has recently generated significant …

Сохранить Цитировать Цитируется: 1286 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Learning to optimize via posterior sampling

D Russo, B Van Roy - Mathematics of Operations Research, 2014 - pubsonline.informs.org

This paper considers the use of a simple posterior sampling algorithm to balance between
exploration and exploitation when learning to optimize actions such as in multiarmed bandit …

Сохранить Цитировать Цитируется: 807 Похожие статьи Все версии статьи (17)

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Parametric bandits: The generalized linear case

Dynamic pricing and learning: historical origins, current research, and new directions

Neural thompson sampling

[КНИГА][B] Bandit algorithms

Introduction to multi-armed bandits

Neural contextual bandits with ucb-based exploration

Weight uncertainty in neural network

Regret analysis of stochastic and nonstochastic multi-armed bandit problems

Provably optimal algorithms for generalized linear contextual bandits

Thompson sampling for contextual bandits with linear payoffs

Learning to optimize via posterior sampling