Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arxiv preprint arxiv …, 2023 - arxiv.org
It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

Causality-based CTR prediction using graph neural networks

P Zhai, Y Yang, C Zhang - Information Processing & Management, 2023 - Elsevier
As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention
from both academia and industry. Recent studies have been reported to establish CTR …

With shared microexponents, a little shifting goes a long way

B Darvish Rouhani, R Zhao, V Elango… - Proceedings of the 50th …, 2023 - dl.acm.org
This paper introduces Block Data Representations (BDR), a framework for exploring and
evaluating a wide spectrum of narrow-precision formats for deep learning. It enables …

Ads recommendation in a collapsed and entangled world

J Pan, W Xue, X Wang, H Yu, X Liu, S Quan… - Proceedings of the 30th …, 2024 - dl.acm.org
We present Tencent's ads recommendation system and examine the challenges and
practices of learning appropriate recommendation representations. Our study begins by …

InterFormer: Towards Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

Z Zeng, X Liu, M Hang, X Liu, Q Zhou, C Yang… - arxiv preprint arxiv …, 2024 - arxiv.org
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is
a fundamental task in recommender systems. The emergence of heterogeneous information …

Towards GPU Memory Efficiency for Distributed Training at Scale

R Cheng, C Cai, S Yilmaz, R Mitra, M Bag… - Proceedings of the …, 2023 - dl.acm.org
The scale of deep learning models has grown tremendously in recent years. State-of-the-art
models have reached billions of parameters and terabyte-scale model sizes. Training of …

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

W Zhang, D Li, C Liang, F Zhou, Z Zhang… - … Proceedings of the …, 2024 - dl.acm.org
Effective user representations are pivotal in personalized advertising. However, stringent
constraints on training throughput, serving latency, and memory, often limit the complexity …

DistDNAS: Search Efficient Feature Interactions within 2 Hours

T Zhang, W Wen, I Fedorov, X Liu… - … Conference on Big …, 2024 - ieeexplore.ieee.org
Search efficiency and serving efficiency are two major axes in building feature interactions
and expediting the model development process in recommender systems. Searching for the …

Rankitect: Ranking architecture search battling world-class engineers at meta scale

W Wen, KH Liu, I Fedorov, X Zhang, H Yin… - … Proceedings of the …, 2024 - dl.acm.org
Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and
potential for ranking systems. However, prior work focused on academic problems, which …

adSformers: Personalization from Short-Term Sequences and Diversity of Representations in Etsy Ads

A Awad, D Roberts, E Dolev, A Heyman… - arxiv preprint arxiv …, 2023 - arxiv.org
In this article, we present a general approach to personalizing ads through encoding and
learning from variable-length sequences of recent user actions and diverse representations …