Federated optimization algorithms with random reshuffling and gradient compression

A Sadiev, G Malinovsky, E Gorbunov, I Sokolov… - arxiv preprint arxiv …, 2022 - arxiv.org
Gradient compression is a popular technique for improving communication complexity of
stochastic first-order methods in distributed training of machine learning models. However …

Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with an inexact prox

A Sadiev, D Kovalev… - Advances in Neural …, 2022 - proceedings.neurips.cc
Inspired by a recent breakthrough of Mishchenko et al.[2022], who for the first time showed
that local gradient steps can lead to provable communication acceleration, we propose an …

Improving accelerated federated learning with compression and importance sampling

M Grudzień, G Malinovsky, P Richtárik - arxiv preprint arxiv:2306.03240, 2023 - arxiv.org
Federated Learning is a collaborative training framework that leverages heterogeneous data
distributed across a vast number of clients. Since it is practically infeasible to request and …

Searching for optimal per-coordinate step-sizes with multidimensional backtracking

F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Knowledge distillation performs partial variance reduction

M Safaryan, A Peste, D Alistarh - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Knowledge distillation is a popular approach for enhancing the performance of"
student" models, with lower representational capacity, by taking advantage of more …

Distributed optimization for overparameterized problems: Achieving optimal dimension independent communication complexity

B Song, I Tsaknakis, CY Yau… - Advances in Neural …, 2022 - proceedings.neurips.cc
Decentralized optimization are playing an important role in applications such as training
large machine learning models, among others. Despite its superior practical performance …

FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity

K Yi, N Gazagnadou, P Richtárik, L Lyu - arxiv preprint arxiv:2404.09816, 2024 - arxiv.org
The interest in federated learning has surged in recent research due to its unique ability to
train a global model using privacy-secured information held locally on each client. This …

Gradskip: Communication-accelerated local gradient methods with better computational complexity

A Maranjyan, M Safaryan, P Richtárik - arxiv preprint arxiv:2210.16402, 2022 - arxiv.org
We study a class of distributed optimization algorithms that aim to alleviate high
communication costs by allowing the clients to perform multiple local gradient-type training …

Towards a better theoretical understanding of independent subnetwork training

E Shulgin, P Richtárik - arxiv preprint arxiv:2306.16484, 2023 - arxiv.org
Modern advancements in large-scale machine learning would be impossible without the
paradigm of data-parallel distributed computing. Since distributed computing with large …