Stochastic gradient descent and its variants in machine learning

P Netrapalli - Journal of the Indian Institute of Science, 2019‏ - Springer
Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian Institute
of Science Skip to main content Springer Nature Link Account Menu Find a journal Publish with …

A survey of stochastic simulation and optimization methods in signal processing

M Pereyra, P Schniter, E Chouzenoux… - IEEE Journal of …, 2015‏ - ieeexplore.ieee.org
Modern signal processing (SP) methods rely very heavily on probability and statistics to
solve challenging SP problems. SP methods are now expected to deal with ever more …

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021‏ - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

A unified theory of decentralized SGD with changing topology and local updates

A Koloskova, N Loizou, S Boreiri… - … on machine learning, 2020‏ - proceedings.mlr.press
Decentralized stochastic optimization methods have gained a lot of attention recently, mainly
because of their cheap per iteration cost, data locality, and their communication-efficiency. In …

Randomized numerical linear algebra: Foundations and algorithms

PG Martinsson, JA Tropp - Acta Numerica, 2020‏ - cambridge.org
This survey describes probabilistic algorithms for linear algebraic computations, such as
factorizing matrices and solving linear systems. It focuses on techniques that have a proven …

Sparsified SGD with memory

SU Stich, JB Cordonnier… - Advances in neural …, 2018‏ - proceedings.neurips.cc
Huge scale machine learning problems are nowadays tackled by distributed optimization
algorithms, ie algorithms that leverage the compute power of many devices for training. The …

SGD: General analysis and improved rates

RM Gower, N Loizou, X Qian… - International …, 2019‏ - proceedings.mlr.press
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arxiv preprint arxiv …, 2016‏ - arxiv.org
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

Can decentralized algorithms outperform centralized algorithms? a case study for decentralized parallel stochastic gradient descent

X Lian, C Zhang, H Zhang, CJ Hsieh… - Advances in neural …, 2017‏ - proceedings.neurips.cc
Most distributed machine learning systems nowadays, including TensorFlow and CNTK, are
built in a centralized fashion. One bottleneck of centralized algorithms lies on high …

Qsparse-local-SGD: Distributed SGD with quantization, sparsification and local computations

D Basu, D Data, C Karakus… - Advances in Neural …, 2019‏ - proceedings.neurips.cc
Communication bottleneck has been identified as a significant issue in distributed
optimization of large-scale learning models. Recently, several approaches to mitigate this …