Академия Google

P Alquier - Foundations and Trends® in Machine Learning, 2024 - nowpublishers.com

Aggregated predictors are obtained by making a set of basic predictors vote according to
some weights, that is, to some probability distribution. Randomized predictors are obtained …

Сохранить Цитировать Цитируется: 222 Похожие статьи Все версии статьи (7) Поиск в библиотеках В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Long-term off-policy evaluation and learning

Y Saito, H Abdollahpouri, J Anderton… - Proceedings of the …, 2024 - dl.acm.org

Short-and long-term outcomes of an algorithm often differ, with damaging downstream
effects. A known example is a click-bait algorithm, which may increase short-term clicks but …

Сохранить Цитировать Цитируется: 5 Похожие статьи Все версии статьи (5)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Oracle-efficient pessimism: Offline policy optimization in contextual bandits

L Wang, A Krishnamurthy… - … Conference on Artificial …, 2024 - proceedings.mlr.press

We consider offline policy optimization (OPO) in contextual bandits, where one is given a
fixed dataset of logged interactions. While pessimistic regularizers are typically used to …

Сохранить Цитировать Цитируется: 9 Похожие статьи Все версии статьи (4) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Cross-validated off-policy evaluation

M Cief, B Kveton, M Kompan - arxiv preprint arxiv:2405.15332, 2024 - arxiv.org

In this paper, we study the problem of estimator selection and hyper-parameter tuning in off-
policy evaluation. Although cross-validation is the most popular method for model selection …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (2) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance sampling

I Aouali, VE Brunel, D Rohde, A Korba - arxiv preprint arxiv:2406.03434, 2024 - arxiv.org

Off-policy learning (OPL) often involves minimizing a risk estimator based on importance
weighting to correct bias from the logging policy used to collect data. However, this method …

Сохранить Цитировать Цитируется: 1 Похожие статьи Все версии статьи (14) В виде HTML

Why the Shooting in the Dark Method Dominates Recommender Systems Practice

D Rohde - Proceedings of the 18th ACM Conference on …, 2024 - dl.acm.org

The introduction of A/B Testing represented a great leap forward in recommender systems
research. Like the randomized control trial for evaluating drug efficacy; A/B Testing has …

Сохранить Цитировать Цитируется: 1 Похожие статьи

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Fast slate policy optimization: Going beyond Plackett-Luce

O Sakhi, D Rohde, N Chopin - arxiv preprint arxiv:2308.01566, 2023 - arxiv.org

An increasingly important building block of large scale machine learning systems is based
on returning slates; an ordered lists of items given a query. Applications of this technology …

Сохранить Цитировать Цитируется: 4 Похожие статьи Все версии статьи (3) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Bayesian off-policy evaluation and learning for large action spaces

I Aouali, VE Brunel, D Rohde, A Korba - arxiv preprint arxiv:2402.14664, 2024 - arxiv.org

In interactive systems, actions are often correlated, presenting an opportunity for more
sample-efficient off-policy evaluation (OPE) and learning (OPL) in large action spaces. We …

Сохранить Цитировать Цитируется: 2 Похожие статьи Все версии статьи (10) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Logarithmic smoothing for pessimistic off-policy evaluation, selection and learning

O Sakhi, I Aouali, P Alquier, N Chopin - arxiv preprint arxiv:2405.14335, 2024 - arxiv.org

This work investigates the offline formulation of the contextual bandit problem, where the
goal is to leverage past interactions collected under a behavior policy to evaluate, select …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (12) В виде HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Position paper: Why the shooting in the dark method dominates recommender systems practice; a call to abandon anti-utopian thinking

D Rohde - arxiv preprint arxiv:2402.02152, 2024 - arxiv.org

Applied recommender systems research is in a curious position. While there is a very
rigorous protocol for measuring performance by A/B testing, best practice for finding aB'to …

Сохранить Цитировать Цитируется: 3 Похожие статьи Все версии статьи (2) В виде HTML

Создать оповещение

Цитировать

Расширенный поиск

Сохранено в вашей библиотеке

Exponential smoothing for off-policy learning

User-friendly introduction to PAC-Bayes bounds

Long-term off-policy evaluation and learning

Oracle-efficient pessimism: Offline policy optimization in contextual bandits

Cross-validated off-policy evaluation

Unified PAC-Bayesian study of pessimism for offline policy learning with regularized importance sampling

Why the Shooting in the Dark Method Dominates Recommender Systems Practice

Fast slate policy optimization: Going beyond Plackett-Luce

Bayesian off-policy evaluation and learning for large action spaces

Logarithmic smoothing for pessimistic off-policy evaluation, selection and learning

Position paper: Why the shooting in the dark method dominates recommender systems practice; a call to abandon anti-utopian thinking