A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications

JS Ng, WYB Lim, NC Luong, Z **ong… - … Surveys & Tutorials, 2021‏ - ieeexplore.ieee.org
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …

Polynomial codes: an optimal design for high-dimensional coded matrix multiplication

Q Yu, M Maddah-Ali… - Advances in Neural …, 2017‏ - proceedings.neurips.cc
We consider a large-scale matrix multiplication problem where the computation is carried
out using a distributed system with a master node and multiple worker nodes, where each …

Speeding up distributed machine learning using codes

K Lee, M Lam, R Pedarsani… - IEEE Transactions …, 2017‏ - ieeexplore.ieee.org
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …

Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding

Q Yu, MA Maddah-Ali… - IEEE Transactions on …, 2020‏ - ieeexplore.ieee.org
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …

A fundamental tradeoff between computation and communication in distributed computing

S Li, MA Maddah-Ali, Q Yu… - IEEE Transactions on …, 2017‏ - ieeexplore.ieee.org
How can we optimally trade extra computing power to reduce the communication load in
distributed computing? We answer this question by characterizing a fundamental tradeoff …

Short-dot: Computing large linear transforms distributedly using coded short dot products

S Dutta, V Cadambe, P Grover - Advances In Neural …, 2016‏ - proceedings.neurips.cc
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …

On the optimal recovery threshold of coded matrix multiplication

S Dutta, M Fahim, F Haddadpour… - IEEE Transactions …, 2019‏ - ieeexplore.ieee.org
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …

Coded computation over heterogeneous clusters

A Reisizadeh, S Prakash, R Pedarsani… - IEEE Transactions …, 2019‏ - ieeexplore.ieee.org
In large-scale distributed computing clusters, such as Amazon EC2, there are several types
of “system noise” that can result in major degradation of performance: system failures …

Coded sparse matrix multiplication

S Wang, J Liu, N Shroff - International Conference on …, 2018‏ - proceedings.mlr.press
In a large-scale and distributed matrix multiplication problem $ C= A^{\intercal} B $, where $
C\in\mathbb {R}^{r\times t} $, the coded computation plays an important role to effectively …

Coded computing: Mitigating fundamental bottlenecks in large-scale distributed computing and machine learning

S Li, S Avestimehr - Foundations and Trends® in …, 2020‏ - nowpublishers.com
We introduce the concept of “coded computing”, a novel computing paradigm that utilizes
coding theory to effectively inject and leverage data/computation redundancy to mitigate …