User-friendly introduction to PAC-Bayes bounds

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

Long-term off-policy evaluation and learning

Y Saito, H Abdollahpouri, J Anderton… - Proceedings of the …, 2024 - dl.acm.org
Short-and long-term outcomes of an algorithm often differ, with damaging downstream
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …

Oracle-efficient pessimism: Offline policy optimization in contextual bandits

L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We consider offline policy optimization (OPO) in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …

Cross-validated off-policy evaluation

M Cief, B Kveton, M Kompan - arxiv preprint arxiv:2405.15332, 2024 - arxiv.org
In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-
policy evaluation. Although cross-validation is the most popular method for model selection …

Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance sampling

I Aouali, VE Brunel, D Rohde, A Korba - arxiv preprint arxiv:2406.03434, 2024 - arxiv.org
Off-policy learning (OPL) often involves minimizing a risk estimator based on importance
weighting to correct bias from the logging policy used to collect data. However, this method …

Why the Shooting in the Dark Method Dominates Recommender Systems Practice

D Rohde - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org
The introduction of A/B Testing represented a great leap forward in recommender systems
research. Like the randomized control trial for evaluating drug efficacy; A/B Testing has …

Fast slate policy optimization: Going beyond Plackett-Luce

O Sakhi, D Rohde, N Chopin - arxiv preprint arxiv:2308.01566, 2023 - arxiv.org
An increasingly important building block of large scale machine learning systems is based
on returning slates; an ordered lists of items given a query. Applications of this technology …

Bayesian off-policy evaluation and learning for large action spaces

I Aouali, VE Brunel, D Rohde, A Korba - arxiv preprint arxiv:2402.14664, 2024 - arxiv.org
In interactive systems, actions are often correlated, presenting an opportunity for more
sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We …

Logarithmic smoothing for pessimistic off-policy evaluation, selection and learning

O Sakhi, I Aouali, P Alquier, N Chopin - arxiv preprint arxiv:2405.14335, 2024 - arxiv.org
This work investigates the offline formulation of the contextual bandit problem, where the
goal is to leverage past interactions collected under a behavior policy to evaluate, select …

Position paper: Why the shooting in the dark method dominates recommender systems practice; a call to abandon anti-utopian thinking

D Rohde - arxiv preprint arxiv:2402.02152, 2024 - arxiv.org
Applied recommender systems research is in a curious position. While there is a very
rigorous protocol for measuring performance by A/B testing, best practice for finding aB'to …