- Academic Search

Y Tian, Y Zhang, H Zhang - Mathematics, 2023 - mdpi.com

In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …

Save Cite Cited by 117 Related articles All 5 versions Free GPT-4 Cached

[Free GPT-4]

[PDF] arxiv.org

Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org

Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Save Cite Cited by 143 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Scaffold: Stochastic controlled averaging for federated learning

SP Karimireddy, S Kale, M Mohri… - International …, 2020 - proceedings.mlr.press

Federated learning is a key scenario in modern large-scale machine learning where the
data remains distributed over a large number of clients and the task is to learn a centralized …

Save Cite Cited by 3225 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

J Zhuang, T Tang, Y Ding… - Advances in neural …, 2020 - proceedings.neurips.cc

Most popular optimizers for deep learning can be broadly categorized as adaptive methods
(eg~ Adam) and accelerated schemes (eg~ stochastic gradient descent (SGD) with …

Save Cite Cited by 680 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Learning to reweight examples for robust deep learning

M Ren, W Zeng, B Yang… - … conference on machine …, 2018 - proceedings.mlr.press

Deep neural networks have been shown to be very powerful modeling tools for many
supervised learning tasks involving complex input patterns. However, they can also easily …

Save Cite Cited by 1771 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arxiv preprint arxiv …, 2016 - arxiv.org

We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

Save Cite Cited by 2337 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] berkeley.edu

Large batch optimization for deep learning: Training bert in 76 minutes

Y You, J Li, S Reddi, J Hseu, S Kumar… - arxiv preprint arxiv …, 2019 - arxiv.org

Training large deep neural networks on massive datasets is computationally very
challenging. There has been recent surge in interest in using large batch stochastic …

Save Cite Cited by 1104 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

A survey of optimization methods from a machine learning perspective

S Sun, Z Cao, H Zhu, J Zhao - IEEE transactions on cybernetics, 2019 - ieeexplore.ieee.org

Machine learning develops rapidly, which has made many theoretical breakthroughs and is
widely applied in various fields. Optimization, as an important part of machine learning, has …

Save Cite Cited by 903 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

H Karimi, J Nutini, M Schmidt - … Conference, ECML PKDD 2016, Riva del …, 2016 - Springer

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear
convergence rate for gradient descent. This condition is a special case of the Łojasiewicz …

Save Cite Cited by 1409 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] neurips.cc

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator

C Fang, CJ Li, Z Lin, T Zhang - Advances in neural …, 2018 - proceedings.neurips.cc

In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …

Save Cite Cited by 676 Related articles All 16 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Stochastic variance reduction for nonconvex optimization

Recent advances in stochastic gradient descent in deep learning

Variance-reduced methods for machine learning

Scaffold: Stochastic controlled averaging for federated learning

Adabelief optimizer: Adapting stepsizes by the belief in observed gradients

Learning to reweight examples for robust deep learning

Federated optimization: Distributed machine learning for on-device intelligence

Large batch optimization for deep learning: Training bert in 76 minutes

A survey of optimization methods from a machine learning perspective

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator