Understanding training efficiency of deep learning recommendation models at scale
The use of GPUs has proliferated for machine learning workflows and is now considered
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
mainstream for many deep learning models. Meanwhile, when training state-of-the-art …
RecShard: statistical feature-based memory optimization for industry-scale neural recommendation
We propose RecShard, a fine-grained embedding table (EMB) partitioning and placement
technique for deep learning recommendation models (DLRMs). RecShard is designed …
technique for deep learning recommendation models (DLRMs). RecShard is designed …
{Check-N-Run}: A checkpointing system for training deep learning recommendation models
Checkpoints play an important role in training long running machine learning (ML) models.
Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that …
Checkpoints take a snapshot of an ML model and store it in a non-volatile memory so that …
Optimizing cpu performance for recommendation systems at-scale
Deep Learning Recommendation Models (DLRMs) are very popular in personalized
recommendation systems and are a major contributor to the data-center AI cycles. Due to the …
recommendation systems and are a major contributor to the data-center AI cycles. Due to the …
Rm-ssd: In-storage computing for large-scale recommendation inference
To meet the strict service level agreement requirements of recommendation systems, the
entire set of embeddings in recommendation systems needs to be loaded into the memory …
entire set of embeddings in recommendation systems needs to be loaded into the memory …
Recpipe: Co-designing models and hardware to jointly optimize recommendation quality and performance
Deep learning recommendation systems must provide high quality, personalized content
under strict tail-latency targets and high system loads. This paper presents RecPipe, a …
under strict tail-latency targets and high system loads. This paper presents RecPipe, a …
Understanding capacity-driven scale-out neural recommendation inference
Deep learning recommendation models have grown to the terabyte scale. Traditional
serving schemes-that load entire models to a single server-are unable to support this scale …
serving schemes-that load entire models to a single server-are unable to support this scale …
Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters
We present FleetRec, a high-performance and scalable recommendation inference system
within tight latency constraints. FleetRec takes advantage of heterogeneous hardware …
within tight latency constraints. FleetRec takes advantage of heterogeneous hardware …
Hercules: Heterogeneity-aware inference serving for at-scale personalized recommendation
Personalized recommendation is an important class of deep-learning applications that
powers a large collection of internet services and consumes a considerable amount of …
powers a large collection of internet services and consumes a considerable amount of …
Mp-rec: Hardware-software co-design to enable multi-path recommendation
Deep learning recommendation systems serve personalized content under diverse tail-
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …
latency targets and input-query loads. In order to do so, state-of-the-art recommendation …