[Књига][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Linear thompson sampling revisited

M Abeille, A Lazaric - Artificial Intelligence and Statistics, 2017 - proceedings.mlr.press
We derive an alternative proof for the regret of Thompson sampling (TS) in the stochastic
linear bandit setting. While we obtain a regret bound of order $ O (d^ 3/2\sqrtT) $ as in …

Tight regret bounds for stochastic combinatorial semi-bandits

B Kveton, Z Wen, A Ashkan… - Artificial Intelligence …, 2015 - proceedings.mlr.press
A stochastic combinatorial semi-bandit is an online learning problem where at each step a
learning agent chooses a subset of ground items subject to constraints, and then observes …

Cascading bandits: Learning to rank in the cascade model

B Kveton, C Szepesvari, Z Wen… - … conference on machine …, 2015 - proceedings.mlr.press
A search engine usually outputs a list of K web pages. The user examines this list, from the
first web page to the last, and chooses the first attractive page. This model of user behavior …

Bandit algorithms: A comprehensive review and their dynamic selection from a portfolio for multicriteria top-k recommendation

A Letard, N Gutowski, O Camp, T Amghar - Expert Systems with …, 2024 - Elsevier
This paper discusses the use of portfolio approaches based on bandit algorithms to optimize
multicriteria decision-making in recommender systems (accuracy and diversity). While …

Thompson sampling for combinatorial semi-bandits

S Wang, W Chen - International Conference on Machine …, 2018 - proceedings.mlr.press
We study the application of the Thompson sampling (TS) methodology to the stochastic
combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm …

Combinatorial bandits revisited

R Combes… - Advances in neural …, 2015 - proceedings.neurips.cc
This paper investigates stochastic and adversarial combinatorial multi-armed bandit
problems. In the stochastic setting under semi-bandit feedback, we derive a problem-specific …

Minimal exploration in structured stochastic bandits

R Combes, S Magureanu… - Advances in Neural …, 2017 - proceedings.neurips.cc
This paper introduces and addresses a wide class of stochastic bandit problems where the
function map** the arm to the corresponding reward exhibits some known structural …

Online influence maximization under independent cascade model with semi-bandit feedback

Z Wen, B Kveton, M Valko… - Advances in neural …, 2017 - proceedings.neurips.cc
We study the online influence maximization problem in social networks under the
independent cascade model. Specifically, we aim to learn the set of" best influencers" in a …

Cascading bandits for large-scale recommendation problems

S Zong, H Ni, K Sung, NR Ke, Z Wen… - arxiv preprint arxiv …, 2016 - arxiv.org
Most recommender systems recommend a list of items. The user examines the list, from the
first item to the last, and often chooses the first attractive item and does not examine the rest …