From federated learning to federated neural architecture search: a survey

H Zhu, H Zhang, Y ** - Complex & Intelligent Systems, 2021 - Springer
Federated learning is a recently proposed distributed machine learning paradigm for privacy
preservation, which has found a wide range of applications where data privacy is of primary …

Communication-efficient distributed deep learning: A comprehensive survey

Z Tang, S Shi, W Wang, B Li, X Chu - arxiv preprint arxiv:2003.06307, 2020 - arxiv.org
Distributed deep learning (DL) has become prevalent in recent years to reduce training time
by leveraging multiple computing devices (eg, GPUs/TPUs) due to larger models and …

Federated learning on non-IID data: A survey

H Zhu, J Xu, S Liu, Y ** - Neurocomputing, 2021 - Elsevier
Federated learning is an emerging distributed machine learning framework for privacy
preservation. However, models trained in federated learning usually have worse …

A field guide to federated optimization

J Wang, Z Charles, Z Xu, G Joshi, HB McMahan… - arxiv preprint arxiv …, 2021 - arxiv.org
Federated learning and analytics are a distributed approach for collaboratively learning
models (or statistics) from decentralized data, motivated by and designed for privacy …

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Federated learning with hierarchical clustering of local updates to improve training on non-IID data

C Briggs, Z Fan, P Andras - 2020 international joint conference …, 2020 - ieeexplore.ieee.org
Federated learning (FL) is a well established method for performing machine learning tasks
over massively distributed data. However in settings where data is distributed in a non-iid …

Tighter theory for local SGD on identical and heterogeneous data

A Khaled, K Mishchenko… - … Conference on Artificial …, 2020 - proceedings.mlr.press
We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …

Local SGD converges fast and communicates little

SU Stich - arxiv preprint arxiv:1805.09767, 2018 - arxiv.org
Mini-batch stochastic gradient descent (SGD) is state of the art in large scale distributed
training. The scheme can reach a linear speedup with respect to the number of workers, but …

Parallel restarted SGD with faster convergence and less communication: Demystifying why model averaging works for deep learning

H Yu, S Yang, S Zhu - Proceedings of the AAAI conference on artificial …, 2019 - ojs.aaai.org
In distributed training of deep neural networks, parallel minibatch SGD is widely used to
speed up the training process by using multiple workers. It uses multiple workers to sample …

Sharper convergence guarantees for asynchronous SGD for distributed and federated learning

A Koloskova, SU Stich, M Jaggi - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the asynchronous stochastic gradient descent algorithm, for distributed training
over $ n $ workers that might be heterogeneous. In this algorithm, workers compute …