Sustainable ai: Environmental implications, challenges and opportunities
This paper explores the environmental impact of the super-linear growth trends for AI from a
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …
holistic perspective, spanning Data, Algorithms, and System Hardware. We characterize the …
Understanding data storage and ingestion for large-scale deep recommendation model training: Industrial product
Datacenter-scale AI training clusters consisting of thousands of domain-specific accelerators
(DSA) are used to train increasingly-complex deep learning models. These clusters rely on a …
(DSA) are used to train increasingly-complex deep learning models. These clusters rely on a …
{CASSINI}:{Network-Aware} Job Scheduling in Machine Learning Clusters
We present CASSINI, a network-aware job scheduler for machine learning (ML) clusters.
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
CASSINI introduces a novel geometric abstraction to consider the communication pattern of …
Estimating and penalizing induced preference shifts in recommender systems
The content that a recommender system (RS) shows to users influences them. Therefore,
when choosing a recommender to deploy, one is implicitly also choosing to induce specific …
when choosing a recommender to deploy, one is implicitly also choosing to induce specific …
Mtia: First generation silicon targeting meta's recommendation systems
Meta has traditionally relied on using CPU-based servers for running inference workloads,
specifically Deep Learning Recommendation Models (DLRM), but the increasing compute …
specifically Deep Learning Recommendation Models (DLRM), but the increasing compute …
Congestion control in machine learning clusters
This paper argues that fair-sharing, the holy grail of congestion control algorithms for
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …
decades, is not necessarily a desirable property in Machine Learning (ML) training clusters …
Better Together: Jointly Optimizing {ML} Collective Scheduling and Execution Planning using {SYNDICATE}
Emerging ML training deployments are trending towards larger models, and hybrid-parallel
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …
training that is not just dominated by compute-intensive all-reduce for gradient aggregation …
Training personalized recommendation systems from (GPU) scratch: Look forward not backwards
Personalized recommendation models (RecSys) are one of the most popular machine
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …
learning workload serviced by hyperscalers. A critical challenge of training RecSys is its …
Understanding {RDMA} microarchitecture resources for performance isolation
Recent years have witnessed the wide adoption of RDMA in the cloud to accelerate first-
party workloads and achieve cost savings by freeing up CPU cycles. Now cloud providers …
party workloads and achieve cost savings by freeing up CPU cycles. Now cloud providers …
Enabling compute-communication overlap in distributed deep learning training platforms
Deep Learning (DL) training platforms are built by interconnecting multiple DL accelerators
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …
(eg, GPU/TPU) via fast, customized interconnects with 100s of gigabytes (GBs) of bandwidth …