A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …
due to benefits such as high reliability, scalability, computation speed, and cost …
Private and secure distributed matrix multiplication with flexible communication load
Large matrix multiplications are central to large-scale machine learning applications. These
operations are often carried out on a distributed computing platform with a master server and …
operations are often carried out on a distributed computing platform with a master server and …
Coded computing: Mitigating fundamental bottlenecks in large-scale distributed computing and machine learning
We introduce the concept of “coded computing”, a novel computing paradigm that utilizes
coding theory to effectively inject and leverage data/computation redundancy to mitigate …
coding theory to effectively inject and leverage data/computation redundancy to mitigate …
Analog lagrange coded computing
A distributed computing scenario is considered, where the computational power of a set of
worker nodes is used to perform a certain computation task over a dataset that is dispersed …
worker nodes is used to perform a certain computation task over a dataset that is dispersed …
Coded sparse matrix computation schemes that leverage partial stragglers
Distributed matrix computations over large clusters can suffer from the problem of slow or
failed worker nodes (called stragglers) which can dominate the overall job execution time …
failed worker nodes (called stragglers) which can dominate the overall job execution time …
Numerically stable coded matrix computations via circulant and rotation matrix embeddings
Polynomial based methods have recently been used in several works for mitigating the
effect of stragglers (slow or failed nodes) in distributed matrix computations. For a system …
effect of stragglers (slow or failed nodes) in distributed matrix computations. For a system …
Berrut approximated coded computing: Straggler resistance beyond polynomial computing
T Jahani-Nezhad… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
One of the major challenges in using distributed learning to train complicated models with
large data sets is to deal with stragglers effect. As a solution, coded computation has been …
large data sets is to deal with stragglers effect. As a solution, coded computation has been …
Random Khatri-Rao-product codes for numerically-stable distributed matrix multiplication
We propose a class of codes called random Khatri-Rao-Product (RKRP) codes for
distributed matrix multiplication in the presence of stragglers. The main advantage of the …
distributed matrix multiplication in the presence of stragglers. The main advantage of the …
List-decodable coded computing: Breaking the adversarial toleration barrier
We consider the problem of coded computing, where a computational task is performed in a
distributed fashion in the presence of adversarial workers. We propose techniques to break …
distributed fashion in the presence of adversarial workers. We propose techniques to break …
Straggler-resistant distributed matrix computation via coding theory: Removing a bottleneck in large-scale data processing
The current big data era routinely requires the processing of large-scale data on massive
distributed computing clusters. In these applications, data sets are often so large that they …
distributed computing clusters. In these applications, data sets are often so large that they …