Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
User-friendly introduction to PAC-Bayes bounds
P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com
Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …
some weights, that is, to some probability distribution. Randomized predictors are obtained …
Long-term off-policy evaluation and learning
Short-and long-term outcomes of an algorithm often differ, with damaging downstream
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …
Oracle-efficient pessimism: Offline policy optimization in contextual bandits
L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press
We consider offline policy optimization (OPO) in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …
Cross-validated off-policy evaluation
In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-
policy evaluation. Although cross-validation is the most popular method for model selection …
policy evaluation. Although cross-validation is the most popular method for model selection …
Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance sampling
Off-policy learning (OPL) often involves minimizing a risk estimator based on importance
weighting to correct bias from the logging policy used to collect data. However, this method …
weighting to correct bias from the logging policy used to collect data. However, this method …
Why the Shooting in the Dark Method Dominates Recommender Systems Practice
D Rohde - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org
The introduction of A/B Testing represented a great leap forward in recommender systems
research. Like the randomized control trial for evaluating drug efficacy; A/B Testing has …
research. Like the randomized control trial for evaluating drug efficacy; A/B Testing has …
Fast slate policy optimization: Going beyond Plackett-Luce
An increasingly important building block of large scale machine learning systems is based
on returning slates; an ordered lists of items given a query. Applications of this technology …
on returning slates; an ordered lists of items given a query. Applications of this technology …
Bayesian off-policy evaluation and learning for large action spaces
In interactive systems, actions are often correlated, presenting an opportunity for more
sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We …
sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We …
Logarithmic smoothing for pessimistic off-policy evaluation, selection and learning
This work investigates the offline formulation of the contextual bandit problem, where the
goal is to leverage past interactions collected under a behavior policy to evaluate, select …
goal is to leverage past interactions collected under a behavior policy to evaluate, select …
Position paper: Why the shooting in the dark method dominates recommender systems practice; a call to abandon anti-utopian thinking
D Rohde - arxiv preprint arxiv:2402.02152, 2024 - arxiv.org
Applied recommender systems research is in a curious position. While there is a very
rigorous protocol for measuring performance by A/B testing, best practice for finding aB'to …
rigorous protocol for measuring performance by A/B testing, best practice for finding aB'to …