Študovňa Google

T Lattimore, C Szepesvári - 2020 - books.google.com

Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Uložiť Citovať Citované 3357-krát Súvisiace články Všetky verzie 9 Vyhľadávanie knižnice

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

More adaptive algorithms for adversarial bandits

CY Wei, H Luo - Conference On Learning Theory, 2018 - proceedings.mlr.press

We develop a novel and generic algorithm for the adversarial multi-armed bandit problem
(or more generally the combinatorial semi-bandit problem). When instantiated differently, our …

Uložiť Citovať Citované 192-krát Súvisiace články Všetky verzie 6 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

What doubling tricks can and can't do for multi-armed bandits

L Besson, E Kaufmann - arxiv preprint arxiv:1803.06971, 2018 - arxiv.org

An online reinforcement learning algorithm is anytime if it does not need to know in advance
the horizon T of the experiment. A well-known technique to obtain an anytime algorithm from …

Uložiť Citovať Citované 123-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits

E Boursier, V Perchet - Advances in Neural Information …, 2019 - proceedings.neurips.cc

Motivated by cognitive radio networks, we consider the stochastic multiplayer multi-armed
bandit problem, where several players pull arms simultaneously and collisions occur if one …

Uložiť Citovať Citované 118-krát Súvisiace články Všetky verzie 11 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Thompson sampling with less exploration is fast and optimal

T **, X Yang, X **ao, P Xu - International Conference on …, 2023 - proceedings.mlr.press

Abstract We propose $\epsilon $-Exploring Thompson Sampling ($\epsilon $-TS), a
modified version of the Thompson Sampling (TS) algorithm for multi-armed bandits. In …

Uložiť Citovať Citované 15-krát Súvisiace články Všetky verzie 4 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Stochastic multi-armed bandits with strongly reward-dependent delays

Y Tang, Y Wang, Z Zheng - International Conference on …, 2024 - proceedings.mlr.press

There has been increasing interest in applying multi-armed bandits to adaptive designs in
clinical trials. However, most literature assumes that a previous patient's survival response of …

Uložiť Citovať Citované 6-krát Súvisiace články Všetky verzie 2 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Statistical efficiency of thompson sampling for combinatorial semi-bandits

P Perrault, E Boursier, M Valko… - Advances in Neural …, 2020 - proceedings.neurips.cc

We investigate stochastic combinatorial multi-armed bandit with semi-bandit feedback
(CMAB). In CMAB, the question of the existence of an efficient policy with an optimal …

Uložiť Citovať Citované 48-krát Súvisiace články Všetky verzie 9 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] nowpublishers.com

Learning in repeated auctions

T Nedelec, C Calauzènes, N El Karoui… - … and Trends® in …, 2022 - nowpublishers.com

Online auctions are one of the most fundamental facets of the modern economy and power
an industry generating hundreds of billions of dollars a year in revenue. Auction theory has …

Uložiť Citovať Citované 43-krát Súvisiace články Všetky verzie 7 Vyhľadávanie knižnice HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

T **, P Xu, X **ao… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the regret of Thompson sampling (TS) algorithms for exponential family bandits,
where the reward distribution is from a one-dimensional exponential family, which covers …

Uložiť Citovať Citované 19-krát Súvisiace články Všetky verzie 10 HTML verzia

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Mots: Minimax optimal thompson sampling

T **, P Xu, J Shi, X **ao, Q Gu - … Conference on Machine …, 2021 - proceedings.mlr.press

Thompson sampling is one of the most widely used algorithms in many online decision
problems due to its simplicity for implementation and superior empirical performance over …

Uložiť Citovať Citované 43-krát Súvisiace články Všetky verzie 7 HTML verzia

Vytvoriť upozornenie

Citovať

Rozšírené vyhľadávanie

Uložené do mojej knižnice

Anytime optimal algorithms in stochastic multi-armed bandits

[KNIHA][B] Bandit algorithms

More adaptive algorithms for adversarial bandits

What doubling tricks can and can't do for multi-armed bandits

SIC-MMAB: Synchronisation involves communication in multiplayer multi-armed bandits

Thompson sampling with less exploration is fast and optimal

Stochastic multi-armed bandits with strongly reward-dependent delays

Statistical efficiency of thompson sampling for combinatorial semi-bandits

Learning in repeated auctions

Finite-time regret of thompson sampling algorithms for exponential family multi-armed bandits

Mots: Minimax optimal thompson sampling