Software-hardware co-design for fast and scalable training of deep learning recommendation models

D Mudigere, Y Hao, J Huang, Z Jia, A Tulloch… - Proceedings of the 49th …, 2022 - dl.acm.org
Deep learning recommendation models (DLRMs) have been used across many business-
critical services at Meta and are the single largest AI application in terms of infrastructure …

Off-policy evaluation for large action spaces via conjunct effect modeling

Y Saito, Q Ren, T Joachims - international conference on …, 2023 - proceedings.mlr.press
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …

Pessimistic reward models for off-policy learning in recommendation

O Jeunen, B Goethals - Proceedings of the 15th ACM Conference on …, 2021 - dl.acm.org
Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …

Off-policy evaluation for large action spaces via policy convolution

N Sachdeva, L Wang, D Liang, N Kallus… - Proceedings of the ACM …, 2024 - dl.acm.org
Develo** accurate off-policy estimators is crucial for both evaluating and optimizing for
new policies. The main challenge in off-policy estimation is the distribution shift between the …

Pessimistic decision-making for recommender systems

O Jeunen, B Goethals - ACM Transactions on Recommender Systems, 2023 - dl.acm.org
Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …

On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation

O Jeunen, I Potapov, A Ustimenko - Proceedings of the 30th ACM …, 2024 - dl.acm.org
Approaches to recommendation are typically evaluated in one of two ways:(1) via a
(simulated) online experiment, often seen as the gold standard, or (2) via some offline …

On component interactions in two-stage recommender systems

J Hron, K Krauth, M Jordan… - Advances in neural …, 2021 - proceedings.neurips.cc
Thanks to their scalability, two-stage recommenders are used by many of today's largest
online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce …

Top-k extreme contextual bandits with arm hierarchy

R Sen, A Rakhlin, L Ying, R Kidambi… - International …, 2021 - proceedings.mlr.press
Motivated by modern applications, such as online advertisement and recommender
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …

The digital transformation in health: How AI can improve the performance of health systems

Á Periáñez, A Fernández Del Río, I Nazarov… - Health Systems & …, 2024 - Taylor & Francis
Mobile health has the potential to revolutionize health care delivery and patient
engagement. In this work, we discuss how integrating Artificial Intelligence into digital health …

POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition

Y Saito, J Yao, T Joachims - arxiv preprint arxiv:2402.06151, 2024 - arxiv.org
We study off-policy learning (OPL) of contextual bandit policies in large discrete action
spaces where existing methods--most of which rely crucially on reward-regression models …