Pytorch fsdp: experiences on scaling fully sharded data parallel
It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …
performance across a broad range of domains. Despite the remarkable progress made in …
Causality-based CTR prediction using graph neural networks
P Zhai, Y Yang, C Zhang - Information Processing & Management, 2023 - Elsevier
As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention
from both academia and industry. Recent studies have been reported to establish CTR …
from both academia and industry. Recent studies have been reported to establish CTR …
With shared microexponents, a little shifting goes a long way
This paper introduces Block Data Representations (BDR), a framework for exploring and
evaluating a wide spectrum of narrow-precision formats for deep learning. It enables …
evaluating a wide spectrum of narrow-precision formats for deep learning. It enables …
Ads recommendation in a collapsed and entangled world
We present Tencent's ads recommendation system and examine the challenges and
practices of learning appropriate recommendation representations. Our study begins by …
practices of learning appropriate recommendation representations. Our study begins by …
InterFormer: Towards Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is
a fundamental task in recommender systems. The emergence of heterogeneous information …
a fundamental task in recommender systems. The emergence of heterogeneous information …
Towards GPU Memory Efficiency for Distributed Training at Scale
The scale of deep learning models has grown tremendously in recent years. State-of-the-art
models have reached billions of parameters and terabyte-scale model sizes. Training of …
models have reached billions of parameters and terabyte-scale model sizes. Training of …
Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta
Effective user representations are pivotal in personalized advertising. However, stringent
constraints on training throughput, serving latency, and memory, often limit the complexity …
constraints on training throughput, serving latency, and memory, often limit the complexity …
DistDNAS: Search Efficient Feature Interactions within 2 Hours
Search efficiency and serving efficiency are two major axes in building feature interactions
and expediting the model development process in recommender systems. Searching for the …
and expediting the model development process in recommender systems. Searching for the …
Rankitect: Ranking architecture search battling world-class engineers at meta scale
Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and
potential for ranking systems. However, prior work focused on academic problems, which …
potential for ranking systems. However, prior work focused on academic problems, which …
adSformers: Personalization from Short-Term Sequences and Diversity of Representations in Etsy Ads
In this article, we present a general approach to personalizing ads through encoding and
learning from variable-length sequences of recent user actions and diverse representations …
learning from variable-length sequences of recent user actions and diverse representations …