Pytorch fsdp: experiences on scaling fully sharded data parallel

Y Zhao, A Gu, R Varma, L Luo, CC Huang, M Xu… - arxiv preprint arxiv …, 2023‏ - arxiv.org
It is widely acknowledged that large models have the potential to deliver superior
performance across a broad range of domains. Despite the remarkable progress made in …

With shared microexponents, a little shifting goes a long way

B Darvish Rouhani, R Zhao, V Elango… - Proceedings of the 50th …, 2023‏ - dl.acm.org
This paper introduces Block Data Representations (BDR), a framework for exploring and
evaluating a wide spectrum of narrow-precision formats for deep learning. It enables …

Ads recommendation in a collapsed and entangled world

J Pan, W Xue, X Wang, H Yu, X Liu, S Quan… - Proceedings of the 30th …, 2024‏ - dl.acm.org
We present Tencent's ads recommendation system and examine the challenges and
practices of learning appropriate recommendation representations. Our study begins by …

Causality-based CTR prediction using graph neural networks

P Zhai, Y Yang, C Zhang - Information Processing & Management, 2023‏ - Elsevier
As a prevalent problem in online advertising, CTR prediction has attracted plentiful attention
from both academia and industry. Recent studies have been reported to establish CTR …

Grace: A scalable graph-based approach to accelerating recommendation model inference

H Ye, S Vedula, Y Chen, Y Yang, A Bronstein… - Proceedings of the 28th …, 2023‏ - dl.acm.org
The high memory bandwidth demand of sparse embedding layers continues to be a critical
challenge in scaling the performance of recommendation models. While prior works have …

InterFormer: Towards Effective Heterogeneous Interaction Learning for Click-Through Rate Prediction

Z Zeng, X Liu, M Hang, X Liu, Q Zhou, C Yang… - arxiv preprint arxiv …, 2024‏ - arxiv.org
Click-through rate (CTR) prediction, which predicts the probability of a user clicking an ad, is
a fundamental task in recommender systems. The emergence of heterogeneous information …

Scaling User Modeling: Large-scale Online User Representations for Ads Personalization in Meta

W Zhang, D Li, C Liang, F Zhou, Z Zhang… - … Proceedings of the …, 2024‏ - dl.acm.org
Effective user representations are pivotal in personalized advertising. However, stringent
constraints on training throughput, serving latency, and memory, often limit the complexity …

AutoML for Large Capacity Modeling of Meta's Ranking Systems

H Yin, KH Liu, M Sun, Y Chen, B Zhang, J Liu… - … Proceedings of the …, 2024‏ - dl.acm.org
Web-scale ranking systems at Meta serving billions of users is complex. Improving ranking
models is essential but engineering heavy. Automated Machine Learning (AutoML) can …

Towards GPU Memory Efficiency for Distributed Training at Scale

R Cheng, C Cai, S Yilmaz, R Mitra, M Bag… - Proceedings of the …, 2023‏ - dl.acm.org
The scale of deep learning models has grown tremendously in recent years. State-of-the-art
models have reached billions of parameters and terabyte-scale model sizes. Training of …

Rankitect: Ranking architecture search battling world-class engineers at meta scale

W Wen, KH Liu, I Fedorov, X Zhang, H Yin… - … Proceedings of the …, 2024‏ - dl.acm.org
Neural Architecture Search (NAS) has demonstrated its efficacy in computer vision and
potential for ranking systems. However, prior work focused on academic problems, which …