A comprehensive survey on coded distributed computing: Fundamentals, challenges, and networking applications
Distributed computing has become a common approach for large-scale computation tasks
due to benefits such as high reliability, scalability, computation speed, and cost …
due to benefits such as high reliability, scalability, computation speed, and cost …
Joint device scheduling and resource allocation for latency constrained wireless federated learning
In federated learning (FL), devices contribute to the global training by uploading their local
model updates via wireless channels. Due to limited computation and communication …
model updates via wireless channels. Due to limited computation and communication …
Speeding up distributed machine learning using codes
Codes are widely used in many engineering applications to offer robustness against noise.
In large-scale systems, there are several types of noise that can affect the performance of …
In large-scale systems, there are several types of noise that can affect the performance of …
Gradient coding: Avoiding stragglers in distributed learning
We propose a novel coding theoretic framework for mitigating stragglers in distributed
learning. We show how carefully replicating data blocks and coding across gradients can …
learning. We show how carefully replicating data blocks and coding across gradients can …
Polynomial codes: an optimal design for high-dimensional coded matrix multiplication
We consider a large-scale matrix multiplication problem where the computation is carried
out using a distributed system with a master node and multiple worker nodes, where each …
out using a distributed system with a master node and multiple worker nodes, where each …
A fundamental tradeoff between computation and communication in distributed computing
How can we optimally trade extra computing power to reduce the communication load in
distributed computing? We answer this question by characterizing a fundamental tradeoff …
distributed computing? We answer this question by characterizing a fundamental tradeoff …
Short-dot: Computing large linear transforms distributedly using coded short dot products
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …
designers have increasingly resorted to parallel and distributed computing to reduce …
Straggler mitigation in distributed matrix multiplication: Fundamental limits and optimal coding
We consider the problem of massive matrix multiplication, which underlies many data
analytic applications, in a large-scale distributed system comprising a group of worker …
analytic applications, in a large-scale distributed system comprising a group of worker …
On the optimal recovery threshold of coded matrix multiplication
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
High-dimensional coded matrix multiplication
Coded computation is a framework for providing redundancy in distributed computing
systems to make them robust to slower nodes, or stragglers. In [1], the authors propose a …
systems to make them robust to slower nodes, or stragglers. In [1], the authors propose a …