- Academic Search

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Save Cite Cited by 117 Related articles All 5 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] arxiv.org

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org

Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Save Cite Cited by 143 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Towards understanding sharpness-aware minimization

M Andriushchenko… - … Conference on Machine …, 2022 - proceedings.mlr.press

Abstract Sharpness-Aware Minimization (SAM) is a recent training method that relies on
worst-case weight perturbations which significantly improves generalization in various …

Save Cite Cited by 157 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

A unified theory of decentralized sgd with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - International …, 2020 - proceedings.mlr.press

Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

Save Cite Cited by 563 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

K Mishchenko, G Malinovsky, S Stich… - International …, 2022 - proceedings.mlr.press

We introduce ProxSkip—a surprisingly simple and provably efficient method for minimizing
the sum of a smooth ($ f $) and an expensive nonsmooth proximable ($\psi $) function. The …

Save Cite Cited by 168 Related articles All 11 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Tighter theory for local SGD on identical and heterogeneous data

A Khaled, K Mishchenko… - … Conference on Artificial …, 2020 - proceedings.mlr.press

We provide a new analysis of local SGD, removing unnecessary assumptions and
elaborating on the difference between two data regimes: identical and heterogeneous. In …

Save Cite Cited by 501 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Federated learning of a mixture of global and local models

F Hanzely, P Richtárik - arxiv preprint arxiv:2002.05516, 2020 - arxiv.org

We propose a new optimization formulation for training federated learning models. The
standard formulation has the form of an empirical risk minimization problem constructed to …

Save Cite Cited by 458 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Federated learning with compression: Unified analysis and sharp guarantees

F Haddadpour, MM Kamani… - International …, 2021 - proceedings.mlr.press

In federated learning, communication cost is often a critical bottleneck to scale up distributed
optimization algorithms to collaboratively learn a model from millions of devices with …

Save Cite Cited by 335 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] springer.com

The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study

E Hassan, MY Shams, NA Hikal, S Elmougy - Multimedia Tools and …, 2023 - Springer

Optimization algorithms are used to improve model accuracy. The optimization process
undergoes multiple cycles until convergence. A variety of optimization strategies have been …

Save Cite Cited by 151 Related articles All 12 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Optimal client sampling for federated learning

W Chen, S Horvath, P Richtarik - arxiv preprint arxiv:2010.13723, 2020 - arxiv.org

It is well understood that client-master communication can be a primary bottleneck in
Federated Learning. In this work, we address this issue with a novel client subsampling …

Save Cite Cited by 202 Related articles All 7 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

SGD: General analysis and improved rates

Recent advances in stochastic gradient descent in deep learning

Variance-reduced methods for machine learning

Towards understanding sharpness-aware minimization

A unified theory of decentralized sgd with changing topology and local updates

Proxskip: Yes! local gradient steps provably lead to communication acceleration! finally!

Tighter theory for local SGD on identical and heterogeneous data

Federated learning of a mixture of global and local models

Federated learning with compression: Unified analysis and sharp guarantees

The effect of choosing optimizer algorithms to improve computer vision tasks: a comparative study

Optimal client sampling for federated learning