Tighter theory for local SGD on identical and heterogeneous data

A Khaled, K Mishchenko… - … conference on artificial …, 2020 - proceedings.mlr.press
We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …

Fedpd: A federated learning framework with adaptivity to non-iid data

X Zhang, M Hong, S Dhople, W Yin… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Federated Learning (FL) is popular for communication-efficient learning from distributed
data. To utilize data at different clients without moving them to the cloud, algorithms such as …

MARINA: Faster non-convex distributed learning with compression

E Gorbunov, KP Burlachenko, Z Li… - … on Machine Learning, 2021 - proceedings.mlr.press
We develop and analyze MARINA: a new communication efficient method for non-convex
distributed learning over heterogeneous datasets. MARINA employs a novel communication …

Federated learning under arbitrary communication patterns

D Avdiukhin… - … Conference on Machine …, 2021 - proceedings.mlr.press
Federated Learning is a distributed learning setting where the goal is to train a centralized
model with training data distributed over a large number of heterogeneous clients, each with …

Stem: A stochastic two-sided momentum algorithm achieving near-optimal sample and communication complexities for federated learning

P Khanduri, P Sharma, H Yang… - Advances in …, 2021 - proceedings.neurips.cc
Federated Learning (FL) refers to the paradigm where multiple worker nodes (WNs) build a
joint model by using local data. Despite extensive research, for a generic non-convex FL …

Fedcluster: Boosting the convergence of federated learning via cluster-cycling

C Chen, Z Chen, Y Zhou… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
We develop FedCluster-a novel federated learning framework with improved optimization
efficiency, and investigate its theoretical convergence properties. The FedCluster groups the …

Bias-variance reduced local SGD for less heterogeneous federated learning

T Murata, T Suzuki - arxiv preprint arxiv:2102.03198, 2021 - arxiv.org
Recently, local SGD has got much attention and been extensively studied in the distributed
learning community to overcome the communication bottleneck problem. However, the …

DASHA: Distributed nonconvex optimization with communication compression, optimal oracle complexity, and no client synchronization

A Tyurin, P Richtárik - arxiv preprint arxiv:2202.01268, 2022 - arxiv.org
We develop and analyze DASHA: a new family of methods for nonconvex distributed
optimization problems. When the local functions at the nodes have a finite-sum or an …

Local methods with adaptivity via scaling

S Chezhegov, S Skorik, N Khachaturov… - arxiv preprint arxiv …, 2024 - arxiv.org
The rapid development of machine learning and deep learning has introduced increasingly
complex optimization challenges that must be addressed. Indeed, training modern …

Quantized FedPD (QFedPD): Beyond Conventional Wisdom–The Energy Benefits of Frequent Communication

A Elgabli, CB Issaid, M Badi… - IEEE Internet of Things …, 2024 - ieeexplore.ieee.org
Federated averaging (FedAvg) is a well-recognized framework for distributed learning that
efficiently manages communication. Several algorithms have emerged to enhance the …