Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

A unified theory of decentralized sgd with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - International …, 2020 - proceedings.mlr.press
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Tighter theory for local SGD on identical and heterogeneous data

A Khaled, K Mishchenko… - … Conference on Artificial …, 2020 - proceedings.mlr.press
We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …

Federated learning of a mixture of global and local models

F Hanzely, P Richtárik - arxiv preprint arxiv:2002.05516, 2020 - arxiv.org
We propose a new optimization formulation for training federated learning models. The
standard formulation has the form of an empirical risk minimization problem constructed to …

Federated learning with compression: Unified analysis and sharp guarantees

F Haddadpour, MM Kamani… - International …, 2021 - proceedings.mlr.press
In federated learning, communication cost is often a critical bottleneck to scale up distributed
optimization algorithms to collaboratively learn a model from millions of devices with …

The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study

E Hassan, MY Shams, NA Hikal, S Elmougy - Multimedia Tools and …, 2023 - Springer
Optimization algorithms are used to improve model accuracy. The optimization process
undergoes multiple cycles until convergence. A variety of optimization strategies have been …

Optimal client sampling for federated learning

W Chen, S Horvath, P Richtarik - arxiv preprint arxiv:2010.13723, 2020 - arxiv.org
It is well understood that client-master communication can be a primary bottleneck in
Federated Learning. In this work, we address this issue with a novel client subsampling …