Coded computing: Mitigating fundamental bottlenecks in large-scale distributed computing and machine learning
We introduce the concept of “coded computing”, a novel computing paradigm that utilizes
coding theory to effectively inject and leverage data/computation redundancy to mitigate …
coding theory to effectively inject and leverage data/computation redundancy to mitigate …
Stochastic gradient coding for straggler mitigation in distributed learning
We consider distributed gradient descent in the presence of stragglers. Recent work on
gradient coding and approximate gradient coding have shown how to add redundancy in …
gradient coding and approximate gradient coding have shown how to add redundancy in …
Coded distributed computing with partial recovery
Coded computation techniques provide robustness against straggling workers in distributed
computing. However, most of the existing schemes require exact provisioning of the …
computing. However, most of the existing schemes require exact provisioning of the …
Straggler-aware distributed learning: Communication–computation latency trade-off
When gradient descent (GD) is scaled to many parallel workers for large-scale machine
learning applications, its per-iteration computation time is limited by straggling workers …
learning applications, its per-iteration computation time is limited by straggling workers …
Berrut approximated coded computing: Straggler resistance beyond polynomial computing
T Jahani-Nezhad… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
One of the major challenges in using distributed learning to train complicated models with
large data sets is to deal with stragglers effect. As a solution, coded computation has been …
large data sets is to deal with stragglers effect. As a solution, coded computation has been …
Codedsketch: A coding scheme for distributed computation of approximated matrix multiplication
T Jahani-Nezhad… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this paper, we propose CodedSketch, as a distributed straggler-resistant scheme to
compute an approximation of the multiplication of two massive matrices. The objective is to …
compute an approximation of the multiplication of two massive matrices. The objective is to …
Slow and stale gradients can win the race
Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner, suffers
from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous …
from delays in runtime as it waits for the slowest workers (stragglers). Asynchronous …
Generalized fractional repetition codes for binary coded computations
This paper addresses the gradient coding and coded matrix multiplication problems in
distributed optimization and coded computing. We present a computationally efficient coding …
distributed optimization and coded computing. We present a computationally efficient coding …
Straggler-resilient personalized federated learning
Federated Learning is an emerging learning paradigm that allows training models from
samples distributed across a large network of clients while respecting privacy and …
samples distributed across a large network of clients while respecting privacy and …
Approximate gradient coding with optimal decoding
Gradient codes use data replication to mitigate the effect of straggling machines in
distributed machine learning. Approximate gradient codes consider codes where the data …
distributed machine learning. Approximate gradient codes consider codes where the data …