Variance reduced proxskip: Algorithm, theory and application to federated learning

G Malinovsky, K Yi, P Richtárik - Advances in Neural …, 2022 - proceedings.neurips.cc
We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Adaptive personalized federated learning

Y Deng, MM Kamani, M Mahdavi - arxiv preprint arxiv:2003.13461, 2020 - arxiv.org
Investigation of the degree of personalization in federated learning algorithms has shown
that only maximizing the performance of the global model will confine the capacity of the …

Federated learning of a mixture of global and local models

F Hanzely, P Richtárik - arxiv preprint arxiv:2002.05516, 2020 - arxiv.org
We propose a new optimization formulation for training federated learning models. The
standard formulation has the form of an empirical risk minimization problem constructed to …

Where to begin? on the impact of pre-training and initialization in federated learning

J Nguyen, J Wang, K Malik, M Sanjabi… - arxiv preprint arxiv …, 2022 - arxiv.org
An oft-cited challenge of federated learning is the presence of heterogeneity.\emph {Data
heterogeneity} refers to the fact that data from different clients may follow very different …

Fedavg with fine tuning: Local updates lead to representation learning

L Collins, H Hassani, A Mokhtari… - Advances in Neural …, 2022 - proceedings.neurips.cc
Abstract The Federated Averaging (FedAvg) algorithm, which consists of alternating
between a few local stochastic gradient updates at client nodes, followed by a model …

Mime: Mimicking centralized stochastic algorithms in federated learning

SP Karimireddy, M Jaggi, S Kale, M Mohri… - arxiv preprint arxiv …, 2020 - arxiv.org
Federated learning (FL) is a challenging setting for optimization due to the heterogeneity of
the data across different clients which gives rise to the client drift phenomenon. In fact …

Communication-efficient and distributed learning over wireless networks: Principles and applications

J Park, S Samarakoon, A Elgabli, J Kim… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) is a promising enabler for the fifth-generation (5G) communication
systems and beyond. By imbuing intelligence into the network edge, edge nodes can …

Cooperative SGD: A unified framework for the design and analysis of local-update SGD algorithms

J Wang, G Joshi - Journal of Machine Learning Research, 2021 - jmlr.org
When training machine learning models using stochastic gradient descent (SGD) with a
large number of nodes or massive edge devices, the communication cost of synchronizing …

Distributionally robust federated averaging

Y Deng, MM Kamani… - Advances in neural …, 2020 - proceedings.neurips.cc
In this paper, we study communication efficient distributed algorithms for distributionally
robust federated learning via periodic averaging with adaptive sampling. In contrast to …