Recent advances in stochastic gradient descent in deep learning
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …
tremendously motivating and hard problem. Among machine learning models, stochastic …
Variance-reduced methods for machine learning
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
Towards understanding sharpness-aware minimization
M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press
Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …
worst-case weight perturbations which significantly improves generalization in various …
A unified theory of decentralized sgd with changing topology and local updates
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …
Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
Tighter theory for local SGD on identical and heterogeneous data
We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …
elaborating on the difference between two data regimes: identical and heterogeneous. In …
Federated learning of a mixture of global and local models
We propose a new optimization formulation for training federated learning models. The
standard formulation has the form of an empirical risk minimization problem constructed to …
standard formulation has the form of an empirical risk minimization problem constructed to …
Federated learning with compression: Unified analysis and sharp guarantees
In federated learning, communication cost is often a critical bottleneck to scale up distributed
optimization algorithms to collaboratively learn a model from millions of devices with …
optimization algorithms to collaboratively learn a model from millions of devices with …
The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study
Optimization algorithms are used to improve model accuracy. The optimization process
undergoes multiple cycles until convergence. A variety of optimization strategies have been …
undergoes multiple cycles until convergence. A variety of optimization strategies have been …
Optimal client sampling for federated learning
It is well understood that client-master communication can be a primary bottleneck in
Federated Learning. In this work, we address this issue with a novel client subsampling …
Federated Learning. In this work, we address this issue with a novel client subsampling …