Federated learning of a mixture of global and local models

F Hanzely, P Richtárik - arxiv preprint arxiv:2002.05516, 2020 - arxiv.org
We propose a new optimization formulation for training federated learning models. The
standard formulation has the form of an empirical risk minimization problem constructed to …

Lower bounds and optimal algorithms for personalized federated learning

F Hanzely, S Hanzely, S Horváth… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this work, we consider the optimization formulation of personalized federated learning
recently introduced by Hanzely & Richtarik (2020) which was shown to give an alternative …

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

Acceleration for compressed gradient descent in distributed and federated optimization

Z Li, D Kovalev, X Qian, P Richtárik - arxiv preprint arxiv:2002.11364, 2020 - arxiv.org
Due to the high communication cost in distributed and federated learning problems,
methods relying on compression of communicated messages are becoming increasingly …

Stochastic gradient descent-ascent: Unified theory and new efficient methods

A Beznosikov, E Gorbunov… - International …, 2023 - proceedings.mlr.press
Abstract Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities problems (VIP) …

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arxiv preprint arxiv …, 2022 - arxiv.org
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

Stochastic hamiltonian gradient methods for smooth games

N Loizou, H Berard… - International …, 2020 - proceedings.mlr.press
The success of adversarial formulations in machine learning has brought renewed
motivation for smooth games. In this work, we focus on the class of stochastic Hamiltonian …

Error compensated distributed SGD can be accelerated

X Qian, P Richtárik, T Zhang - Advances in Neural …, 2021 - proceedings.neurips.cc
Gradient compression is a recent and increasingly popular technique for reducing the
communication cost in distributed training of large-scale machine learning models. In this …

Lower complexity bounds of finite-sum optimization problems: The results and construction

Y Han, G **e, Z Zhang - Journal of Machine Learning Research, 2024 - jmlr.org
In this paper we study the lower complexity bounds for finite-sum optimization problems,
where the objective is the average of $ n $ individual component functions. We consider a …

An optimal algorithm for decentralized finite-sum optimization

H Hendrikx, F Bach, L Massoulie - SIAM Journal on Optimization, 2021 - SIAM
Modern large-scale finite-sum optimization relies on two key aspects: distribution and
stochastic updates. For smooth and strongly convex problems, existing decentralized …