A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications

JS Ng, WYB Lim, NC Luong, Z **ong… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …

Private and secure distributed matrix multiplication with flexible communication load

M Aliasgari, O Simeone… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Large matrix multiplications are central to large-scale machine learning applications. These
operations are often carried out on a distributed computing platform with a master server and …

Coded computing: Mitigating fundamental bottlenecks in large-scale distributed computing and machine learning

S Li, S Avestimehr - Foundations and Trends® in …, 2020 - nowpublishers.com
We introduce the concept of “coded computing”, a novel computing paradigm that utilizes
coding theory to effectively inject and leverage data/computation redundancy to mitigate …

Analog lagrange coded computing

M Soleymani, H Mahdavifar… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
A distributed computing scenario is considered, where the computational power of a set of
worker nodes is used to perform a certain computation task over a dataset that is dispersed …

Coded sparse matrix computation schemes that leverage partial stragglers

AB Das, A Ramamoorthy - IEEE Transactions on Information …, 2022 - ieeexplore.ieee.org
Distributed matrix computations over large clusters can suffer from the problem of slow or
failed worker nodes (called stragglers) which can dominate the overall job execution time …

Numerically stable coded matrix computations via circulant and rotation matrix embeddings

A Ramamoorthy, L Tang - IEEE Transactions on Information …, 2021 - ieeexplore.ieee.org
Polynomial based methods have recently been used in several works for mitigating the
effect of stragglers (slow or failed nodes) in distributed matrix computations. For a system …

Berrut approximated coded computing: Straggler resistance beyond polynomial computing

T Jahani-Nezhad… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
One of the major challenges in using distributed learning to train complicated models with
large data sets is to deal with stragglers effect. As a solution, coded computation has been …

Random Khatri-Rao-product codes for numerically-stable distributed matrix multiplication

AM Subramaniam, A Heidarzadeh… - 2019 57th Annual …, 2019 - ieeexplore.ieee.org
We propose a class of codes called random Khatri-Rao-Product (RKRP) codes for
distributed matrix multiplication in the presence of stragglers. The main advantage of the …

List-decodable coded computing: Breaking the adversarial toleration barrier

M Soleymani, RE Ali, H Mahdavifar… - IEEE Journal on …, 2021 - ieeexplore.ieee.org
We consider the problem of coded computing, where a computational task is performed in a
distributed fashion in the presence of adversarial workers. We propose techniques to break …

Straggler-resistant distributed matrix computation via coding theory: Removing a bottleneck in large-scale data processing

A Ramamoorthy, AB Das, L Tang - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org
The current big data era routinely requires the processing of large-scale data on massive
distributed computing clusters. In these applications, data sets are often so large that they …