[HTML][HTML] Advances and challenges in conversational recommender systems: A survey

C Gao, W Lei, X He, M de Rijke, TS Chua - AI open, 2021 - Elsevier
Recommender systems exploit interaction history to estimate user preference, having been
heavily used in a wide range of industry applications. However, static recommendation …

A tutorial on thompson sampling

DJ Russo, B Van Roy, A Kazerouni… - … and Trends® in …, 2018 - nowpublishers.com
Thompson sampling is an algorithm for online decision problems where actions are taken
sequentially in a manner that must balance between exploiting what is known to maximize …

[LIBRO][B] Bandit algorithms

T Lattimore, C Szepesvári - 2020 - books.google.com
Decision-making in the face of uncertainty is a significant challenge in machine learning,
and the multi-armed bandit model is a commonly used framework to address it. This …

Introduction to multi-armed bandits

A Slivkins - Foundations and Trends® in Machine Learning, 2019 - nowpublishers.com
Multi-armed bandits a simple but very powerful framework for algorithms that make
decisions over time under uncertainty. An enormous body of work has accumulated over the …

Explore, exploit, and explain: personalizing explainable recommendations with bandits

J McInerney, B Lacker, S Hansen, K Higley… - Proceedings of the 12th …, 2018 - dl.acm.org
The multi-armed bandit is an important framework for balancing exploration with exploitation
in recommendation. Exploitation recommends content (eg, products, movies, music playlists) …

Collaborative filtering bandits

S Li, A Karatzoglou, C Gentile - … of the 39th International ACM SIGIR …, 2016 - dl.acm.org
Classical collaborative filtering, and content-based filtering methods try to learn a static
recommendation model given training data. These approaches are far from ideal in highly …

Reinforcement learning to rank in e-commerce search engine: Formalization, analysis, and application

Y Hu, Q Da, A Zeng, Y Yu, Y Xu - Proceedings of the 24th ACM SIGKDD …, 2018 - dl.acm.org
In E-commerce platforms such as Amazon and TaoBao, ranking items in a search session is
a typical multi-step decision-making problem. Learning to rank (LTR) methods have been …

Text-based interactive recommendation via constraint-augmented reinforcement learning

R Zhang, T Yu, Y Shen, H **… - Advances in neural …, 2019 - proceedings.neurips.cc
Text-based interactive recommendation provides richer user preferences and has
demonstrated advantages over traditional interactive recommender systems. However …

Efficient and effective tree-based and neural learning to rank

S Bruch, C Lucchese, FM Nardini - Foundations and Trends® …, 2023 - nowpublishers.com
As information retrieval researchers, we not only develop algorithmic solutions to hard
problems, but we also insist on a proper, multifaceted evaluation of ideas. The literature on …

Adversarial attacks on stochastic bandits

KS Jun, L Li, Y Ma, J Zhu - Advances in neural information …, 2018 - proceedings.neurips.cc
We study adversarial attacks that manipulate the reward signals to control the actions
chosen by a stochastic multi-armed bandit algorithm. We propose the first attack against two …