Software-hardware co-design for fast and scalable training of deep learning recommendation models
Deep learning recommendation models (DLRMs) have been used across many business-
critical services at Meta and are the single largest AI application in terms of infrastructure …
critical services at Meta and are the single largest AI application in terms of infrastructure …
Off-policy evaluation for large action spaces via conjunct effect modeling
We study off-policy evaluation (OPE) of contextual bandit policies for large discrete action
spaces where conventional importance-weighting approaches suffer from excessive …
spaces where conventional importance-weighting approaches suffer from excessive …
Pessimistic reward models for off-policy learning in recommendation
Methods for bandit learning from user interactions often require a model of the reward a
certain context-action pair will yield–for example, the probability of a click on a …
certain context-action pair will yield–for example, the probability of a click on a …
Off-policy evaluation for large action spaces via policy convolution
Develo** accurate off-policy estimators is crucial for both evaluating and optimizing for
new policies. The main challenge in off-policy estimation is the distribution shift between the …
new policies. The main challenge in off-policy estimation is the distribution shift between the …
Pessimistic decision-making for recommender systems
Modern recommender systems are often modelled under the sequential decision-making
paradigm, where the system decides which recommendations to show in order to maximise …
paradigm, where the system decides which recommendations to show in order to maximise …
On (Normalised) Discounted Cumulative Gain as an Off-Policy Evaluation Metric for Top-n Recommendation
Approaches to recommendation are typically evaluated in one of two ways:(1) via a
(simulated) online experiment, often seen as the gold standard, or (2) via some offline …
(simulated) online experiment, often seen as the gold standard, or (2) via some offline …
On component interactions in two-stage recommender systems
Thanks to their scalability, two-stage recommenders are used by many of today's largest
online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce …
online platforms, including YouTube, LinkedIn, and Pinterest. These systems produce …
Top-k extreme contextual bandits with arm hierarchy
Motivated by modern applications, such as online advertisement and recommender
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …
systems, we study the top-$ k $ extreme contextual bandits problem, where the total number …
The digital transformation in health: How AI can improve the performance of health systems
Á Periáñez, A Fernández Del Río, I Nazarov… - Health Systems & …, 2024 - Taylor & Francis
Mobile health has the potential to revolutionize health care delivery and patient
engagement. In this work, we discuss how integrating Artificial Intelligence into digital health …
engagement. In this work, we discuss how integrating Artificial Intelligence into digital health …
POTEC: Off-Policy Learning for Large Action Spaces via Two-Stage Policy Decomposition
We study off-policy learning (OPL) of contextual bandit policies in large discrete action
spaces where existing methods--most of which rely crucially on reward-regression models …
spaces where existing methods--most of which rely crucially on reward-regression models …