- Academic Search

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Save Cite Cited by 116 Related articles All 5 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] neurips.cc

Variance reduced proxskip: Algorithm, theory and application to federated learning

G Malinovsky, K Yi, P Richtárik - Advances in Neural …, 2022 - proceedings.neurips.cc

We study distributed optimization methods based on the {\em local training (LT)} paradigm,
ie, methods which achieve communication efficiency by performing richer local gradient …

Save Cite Cited by 36 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press

We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Save Cite Cited by 166 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Robustness to unbounded smoothness of generalized signsgd

M Crawshaw, M Liu, F Orabona… - Advances in neural …, 2022 - proceedings.neurips.cc

Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …

Save Cite Cited by 68 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

A guide through the zoo of biased SGD

Y Demidovich, G Malinovsky… - Advances in Neural …, 2023 - proceedings.neurips.cc

Abstract Stochastic Gradient Descent (SGD) is arguably the most important single algorithm
in modern machine learning. Although SGD with unbiased gradient estimators has been …

Save Cite Cited by 30 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press

In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

Save Cite Cited by 143 Related articles All 15 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

A Sadiev, M Danilova, E Gorbunov… - International …, 2023 - proceedings.mlr.press

During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …

Save Cite Cited by 47 Related articles All 14 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

On the convergence of stochastic multi-objective gradient manipulation and beyond

S Zhou, W Zhang, J Jiang, W Zhong… - Advances in Neural …, 2022 - proceedings.neurips.cc

The conflicting gradients problem is one of the major bottlenecks for the effective training of
machine learning models that deal with multiple objectives. To resolve this problem, various …

Save Cite Cited by 42 Related articles All 3 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Asynchronous SGD beats minibatch SGD under arbitrary delays

K Mishchenko, F Bach, M Even… - Advances in Neural …, 2022 - proceedings.neurips.cc

The existing analysis of asynchronous stochastic gradient descent (SGD) degrades
dramatically when any delay is large, giving the impression that performance depends …

Save Cite Cited by 58 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Random reshuffling: Simple analysis with vast improvements

K Mishchenko, A Khaled… - Advances in Neural …, 2020 - proceedings.neurips.cc

Random Reshuffling (RR) is an algorithm for minimizing finite-sum functions that utilizes
iterative gradient descent steps in conjunction with data reshuffling. Often contrasted with its …

Save Cite Cited by 159 Related articles All 11 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Better theory for SGD in the nonconvex world

Recent advances in stochastic gradient descent in deep learning

Variance reduced proxskip: Algorithm, theory and application to federated learning

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

Robustness to unbounded smoothness of generalized signsgd

A guide through the zoo of biased SGD

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance

On the convergence of stochastic multi-objective gradient manipulation and beyond

Asynchronous SGD beats minibatch SGD under arbitrary delays

Random reshuffling: Simple analysis with vast improvements