A unified architecture for accelerating distributed {DNN} training in heterogeneous {GPU/CPU} clusters
Data center clusters that run DNN training jobs are inherently heterogeneous. They have
GPUs and CPUs for computation and network bandwidth for distributed training. However …
GPUs and CPUs for computation and network bandwidth for distributed training. However …
Scaling distributed machine learning with {In-Network} aggregation
Training machine learning models in parallel is an increasingly important workload. We
accelerate distributed parallel training by designing a communication primitive that uses a …
accelerate distributed parallel training by designing a communication primitive that uses a …
Tiresias: A {GPU} cluster manager for distributed deep learning
Deep learning (DL) training jobs bring some unique challenges to existing cluster
managers, such as unpredictable training times, an all-or-nothing execution model, and …
managers, such as unpredictable training times, an all-or-nothing execution model, and …
{ATP}: In-network aggregation for multi-tenant learning
Distributed deep neural network training (DT) systems are widely deployed in clusters where
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …
the network is shared across multiple tenants, ie, multiple DT jobs. Each DT job computes …
In-network aggregation for data center networks: A survey
A Feng, D Dong, F Lei, J Ma, E Yu, R Wang - Computer Communications, 2023 - Elsevier
Aggregation applications are widely deployed in data centers, such as distributed machine
learning and MapReduce-like framework. These applications typically have large …
learning and MapReduce-like framework. These applications typically have large …
Distributed hierarchical gpu parameter server for massive scale deep learning ads systems
Neural networks of ads systems usually take input from multiple resources, eg query-ad
relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot …
relevance, ad features and user portraits. These inputs are encoded into one-hot or multi-hot …
Grace: A compressed communication framework for distributed machine learning
Powerful computer clusters are used nowadays to train complex deep neural networks
(DNN) on large datasets. Distributed training increasingly becomes communication bound …
(DNN) on large datasets. Distributed training increasingly becomes communication bound …
Priority-based parameter propagation for distributed DNN training
Data parallel training is widely used for scaling distributed deep neural network (DNN)
training. However, the performance benefits are often limited by the communication-heavy …
training. However, the performance benefits are often limited by the communication-heavy …
Efficient sparse collective communication and its application to accelerate distributed deep learning
Efficient collective communication is crucial to parallel-computing applications such as
distributed training of large-scale recommendation systems and natural language …
distributed training of large-scale recommendation systems and natural language …
Accelerating decentralized federated learning in heterogeneous edge computing
In edge computing (EC), federated learning (FL) enables massive devices to collaboratively
train AI models without exposing local data. In order to avoid the possible bottleneck of the …
train AI models without exposing local data. In order to avoid the possible bottleneck of the …