Recent advances in stochastic gradient descent in deep learning
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …
tremendously motivating and hard problem. Among machine learning models, stochastic …
Variance reduced proxskip: Algorithm, theory and application to federated learning
We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …
ie, methods which achieve communication efficiency by performing richer local gradient …
Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!
We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …
Robustness to unbounded smoothness of generalized signsgd
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
A guide through the zoo of biased SGD
Abstract Stochastic Gradient Descent (SGD) is arguably the most important single algorithm
in modern machine learning. Although SGD with unbiased gradient estimators has been …
in modern machine learning. Although SGD with unbiased gradient estimators has been …
PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance
During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …
high-probability convergence of stochastic optimization methods has been growing. One of …
On the convergence of stochastic multi-objective gradient manipulation and beyond
S Zhou, W Zhang, J Jiang, W Zhong… - Advances in Neural …, 2022 - proceedings.neurips.cc
The conflicting gradients problem is one of the major bottlenecks for the effective training of
machine learning models that deal with multiple objectives. To resolve this problem, various …
machine learning models that deal with multiple objectives. To resolve this problem, various …
Asynchronous SGD beats minibatch SGD under arbitrary delays
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades
dramatically when any delay is large, giving the impression that performance depends …
dramatically when any delay is large, giving the impression that performance depends …
Random reshuffling: Simple analysis with vast improvements
Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …