Federated optimization algorithms with random reshuffling and gradient compression
Gradient compression is a popular technique for improving communication complexity of
stochastic first-order methods in distributed training of machine learning models. However …
stochastic first-order methods in distributed training of machine learning models. However …
Communication acceleration of local gradient methods via an accelerated primal-dual algorithm with an inexact prox
Inspired by a recent breakthrough of Mishchenko et al.[2022], who for the first time showed
that local gradient steps can lead to provable communication acceleration, we propose an …
that local gradient steps can lead to provable communication acceleration, we propose an …
Improving accelerated federated learning with compression and importance sampling
Federated Learning is a collaborative training framework that leverages heterogeneous data
distributed across a vast number of clients. Since it is practically infeasible to request and …
distributed across a vast number of clients. Since it is practically infeasible to request and …
Searching for optimal per-coordinate step-sizes with multidimensional backtracking
The backtracking line-search is an effective technique to automatically tune the step-size in
smooth optimization. It guarantees similar performance to using the theoretically optimal …
smooth optimization. It guarantees similar performance to using the theoretically optimal …
Fine-tuning language models over slow networks using activation quantization with guarantees
Communication compression is a crucial technique for modern distributed learning systems
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
to alleviate their communication bottlenecks over slower networks. Despite recent intensive …
Knowledge distillation performs partial variance reduction
Abstract Knowledge distillation is a popular approach for enhancing the performance of"
student" models, with lower representational capacity, by taking advantage of more …
student" models, with lower representational capacity, by taking advantage of more …
Distributed optimization for overparameterized problems: Achieving optimal dimension independent communication complexity
Decentralized optimization are playing an important role in applications such as training
large machine learning models, among others. Despite its superior practical performance …
large machine learning models, among others. Despite its superior practical performance …
FedP3: Federated Personalized and Privacy-friendly Network Pruning under Model Heterogeneity
The interest in federated learning has surged in recent research due to its unique ability to
train a global model using privacy-secured information held locally on each client. This …
train a global model using privacy-secured information held locally on each client. This …
Gradskip: Communication-accelerated local gradient methods with better computational complexity
We study a class of distributed optimization algorithms that aim to alleviate high
communication costs by allowing the clients to perform multiple local gradient-type training …
communication costs by allowing the clients to perform multiple local gradient-type training …
Towards a better theoretical understanding of independent subnetwork training
Modern advancements in large-scale machine learning would be impossible without the
paradigm of data-parallel distributed computing. Since distributed computing with large …
paradigm of data-parallel distributed computing. Since distributed computing with large …