- Academic Search

Save Cite Cited by 20 Related articles All 11 versions Free GPT-4 View as HTML

Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with an inexact prox

A Sadiev, D Kovalev… - Advances in Neural …, 2022 - proceedings.neurips.cc

Inspired by a recent breakthrough of Mishchenko et al.[2022], who for the first time showed
that local gradient steps can lead to provable communication acceleration, we propose an …

Improving accelerated federated learning with compression and importance sampling

M Grudzień, G Malinovsky, P Richtárik - arxiv preprint arxiv:2306.03240, 2023 - arxiv.org

Federated Learning is a collaborative training framework that leverages heterogeneous data
distributed across a vast number of clients. Since it is practically infeasible to request and …

Save Cite Cited by 14 Related articles All 6 versions Free GPT-4 View as HTML

Searching for optimal per-coordinate step-sizes with multidimensional backtracking

F Kunstner, V Sanches Portella… - Advances in Neural …, 2023 - proceedings.neurips.cc

The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …

Save Cite Cited by 6 Related articles All 6 versions Free GPT-4 View as HTML

Fine-tuning language models over slow networks using activation quantization with guarantees

J Wang, B Yuan, L Rimanic, Y He… - Advances in …, 2022 - proceedings.neurips.cc

Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …

Save Cite Cited by 10 Related articles All 5 versions Free GPT-4 View as HTML

Knowledge distillation performs partial variance reduction

M Safaryan, A Peste, D Alistarh - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Knowledge distillation is a popular approach for enhancing the performance of"
student" models, with lower representational capacity, by taking advantage of more …

Save Cite Cited by 3 Related articles All 7 versions Free GPT-4 View as HTML

Distributed optimization for overparameterized problems: Achieving optimal dimension independent communication complexity

B Song, I Tsaknakis, CY Yau… - Advances in Neural …, 2022 - proceedings.neurips.cc

Decentralized optimization are playing an important role in applications such as training
large machine learning models, among others. Despite its superior practical performance …

Save Cite Cited by 5 Related articles All 5 versions Free GPT-4 View as HTML

FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity

K Yi, N Gazagnadou, P Richtárik, L Lyu - arxiv preprint arxiv:2404.09816, 2024 - arxiv.org

The interest in federated learning has surged in recent research due to its unique ability to
train a global model using privacy-secured information held locally on each client. This …

Save Cite Cited by 10 Related articles All 5 versions Free GPT-4 View as HTML

Gradskip: Communication-accelerated local gradient methods with better computational complexity

A Maranjyan, M Safaryan, P Richtárik - arxiv preprint arxiv:2210.16402, 2022 - arxiv.org

We study a class of distributed optimization algorithms that aim to alleviate high
communication costs by allowing the clients to perform multiple local gradient-type training …

Save Cite Cited by 11 Related articles All 5 versions Free GPT-4 View as HTML