Federated learning via over-the-air computation
The stringent requirements for low-latency and privacy of the emerging high-stake
applications with intelligent devices such as drones and smart vehicles make the cloud …
applications with intelligent devices such as drones and smart vehicles make the cloud …
Lagrange coded computing: Optimal design for resiliency, security, and privacy
We consider a scenario involving computations over a massive dataset stored distributedly
across multiple workers, which is at the core of distributed learning algorithms. We propose …
across multiple workers, which is at the core of distributed learning algorithms. We propose …
Towards demystifying serverless machine learning training
The appeal of serverless (FaaS) has triggered a growing interest on how to use it in data-
intensive applications such as ETL, query processing, or machine learning (ML). Several …
intensive applications such as ETL, query processing, or machine learning (ML). Several …
Short-dot: Computing large linear transforms distributedly using coded short dot products
Faced with saturation of Moore's law and increasing size and dimension of data, system
designers have increasingly resorted to parallel and distributed computing to reduce …
designers have increasingly resorted to parallel and distributed computing to reduce …
On the optimal recovery threshold of coded matrix multiplication
We provide novel coded computation strategies for distributed matrix-matrix products that
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
outperform the recent “Polynomial code” constructions in recovery threshold, ie, the required …
Coded computation over heterogeneous clusters
In large-scale distributed computing clusters, such as Amazon EC2, there are several types
of “system noise” that can result in major degradation of performance: system failures …
of “system noise” that can result in major degradation of performance: system failures …
Coded computing for low-latency federated learning over wireless edge networks
Federated learning enables training a global model from data located at the client nodes,
without data sharing and moving client data to a centralized server. Performance of …
without data sharing and moving client data to a centralized server. Performance of …
Gradient coding from cyclic MDS codes and expander graphs
Gradient coding is a technique for straggler mitigation in distributed learning. In this paper
we design novel gradient codes using tools from classical coding theory, namely, cyclic …
we design novel gradient codes using tools from classical coding theory, namely, cyclic …
Motivating workers in federated learning: A stackelberg game perspective
Due to the large size of the training data, distributed learning approaches such as federated
learning have gained attention recently. However, the convergence rate of distributed …
learning have gained attention recently. However, the convergence rate of distributed …
Slow and stale gradients can win the race: Error-runtime trade-offs in distributed SGD
Abstract Distributed Stochastic Gradient Descent (SGD) when run in a synchronous manner,
suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods …
suffers from delays in waiting for the slowest learners (stragglers). Asynchronous methods …