[HTML][HTML] Advances and challenges in conversational recommender systems: A survey
Recommender systems exploit interaction history to estimate user preference, having been
heavily used in a wide range of industry applications. However, static recommendation …
heavily used in a wide range of industry applications. However, static recommendation …
A tutorial on thompson sampling
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …
sequentially in a manner that must balance between exploiting what is known to maximize …
[LIBRO][B] Bandit algorithms
T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …
and the multi-armed bandit model is a commonly used framework to address it. This …
Introduction to multi-armed bandits
A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …
decisions over time under uncertainty. An enormous body of work has accumulated over the …
Explore, exploit, and explain: personalizing explainable recommendations with bandits
The multi-armed bandit is an important framework for balancing exploration with exploitation
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …
Collaborative filtering bandits
Classical collaborative filtering, and content-based filtering methods try to learn a static
recommendation model given training data. These approaches are far from ideal in highly …
recommendation model given training data. These approaches are far from ideal in highly …
Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application
In E-commerce platforms such as Amazon and TaoBao, ranking items in a search session is
a typical multi-step decision-making problem. Learning to rank (LTR) methods have been …
a typical multi-step decision-making problem. Learning to rank (LTR) methods have been …
Text-based interactive recommendation via constraint-augmented reinforcement learning
Text-based interactive recommendation provides richer user preferences and has
demonstrated advantages over traditional interactive recommender systems. However …
demonstrated advantages over traditional interactive recommender systems. However …
Efficient and effective tree-based and neural learning to rank
As information retrieval researchers, we not only develop algorithmic solutions to hard
problems, but we also insist on a proper, multifaceted evaluation of ideas. The literature on …
problems, but we also insist on a proper, multifaceted evaluation of ideas. The literature on …
Adversarial attacks on stochastic bandits
We study adversarial attacks that manipulate the reward signals to control the actions
chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two …
chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two …