[책][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application
In E-commerce platforms such as Amazon and TaoBao, ranking items in a search session is
a typical multi-step decision-making problem. Learning to rank (LTR) methods have been …
a typical multi-step decision-making problem. Learning to rank (LTR) methods have been …
Thompson sampling for combinatorial semi-bandits
S Wang, W Chen - International Conference on Machine …, 2018 - proceedings.mlr.press
We study the application of the Thompson sampling (TS) methodology to the stochastic
combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm …
combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm …
Combinatorial multi-armed bandit with general reward functions
In this paper, we study the stochastic combinatorial multi-armed bandit (CMAB) framework
that allows a general nonlinear reward function, whose expected value may not depend only …
that allows a general nonlinear reward function, whose expected value may not depend only …
Contextual combinatorial cascading bandits
We propose the contextual combinatorial cascading bandits, a combinatorial online learning
game, where at each time step a learning agent is given a set of contextual information, then …
game, where at each time step a learning agent is given a set of contextual information, then …
Online influence maximization under independent cascade model with semi-bandit feedback
We study the online influence maximization problem in social networks under the
independent cascade model. Specifically, we aim to learn the set of" best influencers" in a …
independent cascade model. Specifically, we aim to learn the set of" best influencers" in a …
Cascading bandits for large-scale recommendation problems
Most recommender systems recommend a list of items. The user examines the list, from the
first item to the last, and often chooses the first attractive item and does not examine the rest …
first item to the last, and often chooses the first attractive item and does not examine the rest …
Online learning to rank in stochastic click models
Online learning to rank is a core problem in information retrieval and machine learning.
Many provably efficient algorithms have been recently proposed for this problem in specific …
Many provably efficient algorithms have been recently proposed for this problem in specific …
Contextual combinatorial bandits with probabilistically triggered arms
We study contextual combinatorial bandits with probabilistically triggered arms (C $^ 2$
MAB-T) under a variety of smoothness conditions that capture a wide range of applications …
MAB-T) under a variety of smoothness conditions that capture a wide range of applications …