Recent advances in stochastic gradient descent in deep learning

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Variance reduced proxskip: Algorithm, theory and application to federated learning

G Malinovsky, K Yi, P Richtárik - Advances in Neural …, 2022 - proceedings.neurips.cc
We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

A guide through the zoo of biased SGD

Y Demidovich, G Malinovsky… - Advances in Neural …, 2023 - proceedings.neurips.cc
Abstract Stochastic Gradient Descent (SGD) is arguably the most important single algorithm
in modern machine learning. Although SGD with unbiased gradient estimators has been …

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

A Sadiev, M Danilova, E Gorbunov… - International …, 2023 - proceedings.mlr.press
During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …

On the convergence of stochastic multi-objective gradient manipulation and beyond

S Zhou, W Zhang, J Jiang, W Zhong… - Advances in Neural …, 2022 - proceedings.neurips.cc
The conflicting gradients problem is one of the major bottlenecks for the effective training of
machine learning models that deal with multiple objectives. To resolve this problem, various …

Asynchronous SGD beats minibatch SGD under arbitrary delays

K Mishchenko, F Bach, M Even… - Advances in Neural …, 2022 - proceedings.neurips.cc
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades
dramatically when any delay is large, giving the impression that performance depends …

Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …