From federated learning to federated neural architecture search: a survey
Federated learning is a recently proposed distributed machine learning paradigm for privacy
preservation, which has found a wide range of applications where data privacy is of primary …
preservation, which has found a wide range of applications where data privacy is of primary …
Communication-efficient distributed deep learning: A comprehensive survey
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …
A field guide to federated optimization
Federated learning and analytics are a distributed approach for collaboratively learning
models (or statistics) from decentralized data, motivated by and designed for privacy …
models (or statistics) from decentralized data, motivated by and designed for privacy …
Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
Federated learning with hierarchical clustering of local updates to improve training on non-IID data
Federated learning (FL) is a well established method for performing machine learning tasks
over massively distributed data. However in settings where data is distributed in a non-iid …
over massively distributed data. However in settings where data is distributed in a non-iid …
Tighter theory for local SGD on identical and heterogeneous data
We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …
elaborating on the difference between two data regimes: identical and heterogeneous. In …
Local SGD converges fast and communicates little
SU Stich - arxiv preprint arxiv:1805.09767, 2018 - arxiv.org
Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed
training. The scheme can reach a linear speedup with respect to the number of workers, but …
training. The scheme can reach a linear speedup with respect to the number of workers, but …
Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning
In distributed training of deep neural networks, parallel minibatch SGD is widely used to
speed up the training process by using multiple workers. It uses multiple workers to sample …
speed up the training process by using multiple workers. It uses multiple workers to sample …
Sharper convergence guarantees for asynchronous SGD for distributed and federated learning
We study the asynchronous stochastic gradient descent algorithm, for distributed training
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …