Communication-efficient distributed learning: An overview
Distributed learning is envisioned as the bedrock of next-generation intelligent networks,
where intelligent agents, such as mobile devices, robots, and sensors, exchange information …
where intelligent agents, such as mobile devices, robots, and sensors, exchange information …
Communication-efficient distributed deep learning: A comprehensive survey
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
EF21: A new, simpler, theoretically better, and practically faster error feedback
Error feedback (EF), also known as error compensation, is an immensely popular
convergence stabilization mechanism in the context of distributed training of supervised …
convergence stabilization mechanism in the context of distributed training of supervised …
Optimal client sampling for federated learning
It is well understood that client-master communication can be a primary bottleneck in
Federated Learning. In this work, we address this issue with a novel client subsampling …
Federated Learning. In this work, we address this issue with a novel client subsampling …
Local sgd: Unified theory and new efficient methods
We present a unified framework for analyzing local SGD methods in the convex and strongly
convex regimes for distributed/federated training of supervised machine learning models …
convex regimes for distributed/federated training of supervised machine learning models …
Stochastic distributed learning with gradient quantization and double-variance reduction
We consider distributed optimization over several devices, each sending incremental model
updates to a central server. This setting is considered, for instance, in federated learning …
updates to a central server. This setting is considered, for instance, in federated learning …
Natural compression for distributed deep learning
Modern deep learning models are often trained in parallel over a collection of distributed
machines to reduce training time. In such settings, communication of model updates among …
machines to reduce training time. In such settings, communication of model updates among …
A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent
In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …
stochastic gradient descent (SGD) which so far have required different intuitions …
A better alternative to error feedback for communication-efficient distributed learning
Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed compute systems. A key bottleneck of such …
algorithms to be implemented on distributed compute systems. A key bottleneck of such …
EF-BV: A unified theory of error feedback and variance reduction mechanisms for biased and unbiased compression in distributed optimization
In distributed or federated optimization and learning, communication between the different
computing units is often the bottleneck and gradient compression is widely used to reduce …
computing units is often the bottleneck and gradient compression is widely used to reduce …