Understanding training efficiency of deep learning recommendation models at scale

B Acun, M Murphy, X Wang, J Nie… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …

RecShard: statistical feature-based memory optimization for industry-scale neural recommendation

G Sethi, B Acun, N Agarwal, C Kozyrakis… - Proceedings of the 27th …, 2022 - dl.acm.org
We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement
technique for deep learning recommendation models (DLRMs). RecShard is designed …

{Check-N-Run}: A checkpointing system for training deep learning recommendation models

A Eisenman, KK Matam, S Ingram, D Mudigere… - … USENIX Symposium on …, 2022 - usenix.org
Checkpoints play an important role in training long running machine learning (ML) models.
Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that …

Optimizing cpu performance for recommendation systems at-scale

R Jain, S Cheng, V Kalagi, V Sanghavi, S Kaul… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep Learning Recommendation Models (DLRMs) are very popular in personalized
recommendation systems and are a major contributor to the data-center AI cycles. Due to the …

Rm-ssd: In-storage computing for large-scale recommendation inference

X Sun, H Wan, Q Li, CL Yang, TW Kuo… - … Symposium on High …, 2022 - ieeexplore.ieee.org
To meet the strict service level agreement requirements of recommendation systems, the
entire set of embeddings in recommendation systems needs to be loaded into the memory …

Recpipe: Co-designing models and hardware to jointly optimize recommendation quality and performance

U Gupta, S Hsia, J Zhang, M Wilkening… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Deep learning recommendation systems must provide high quality, personalized content
under strict tail-latency targets and high system loads. This paper presents RecPipe, a …

Understanding capacity-driven scale-out neural recommendation inference

M Lui, Y Yetim, Ö Özkan, Z Zhao… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Deep learning recommendation models have grown to the terabyte scale. Traditional
serving schemes-that load entire models to a single server-are unable to support this scale …

Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters

W Jiang, Z He, S Zhang, K Zeng, L Feng… - Proceedings of the 27th …, 2021 - dl.acm.org
We present FleetRec, a high-performance and scalable recommendation inference system
within tight latency constraints. FleetRec takes advantage of heterogeneous hardware …

Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation

L Ke, U Gupta, M Hempstead, CJ Wu… - … Symposium on High …, 2022 - ieeexplore.ieee.org
Personalized recommendation is an important class of deep-learning applications that
powers a large collection of internet services and consumes a considerable amount of …

Mp-rec: Hardware-software co-design to enable multi-path recommendation

S Hsia, U Gupta, B Acun, N Ardalani, P Zhong… - Proceedings of the 28th …, 2023 - dl.acm.org
Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …