- Academic Search

R Shwartz-Ziv, N Tishby - arxiv preprint arxiv:1703.00810, 2017 - arxiv.org

Despite their great success, there is still no comprehensive theoretical understanding of
learning with Deep Neural Networks (DNNs) or their inner organization. Previous work …

Save Cite Cited by 1693 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] iisc.ac.in

Stochastic gradient descent and its variants in machine learning

P Netrapalli - Journal of the Indian Institute of Science, 2019 - Springer

Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …

Save Cite Cited by 115 Related articles All 3 versions Free GPT-4

[Free GPT-4]

[PDF] jmlr.org

Stochastic gradient descent as approximate bayesian inference

M Stephan, MD Hoffman, DM Blei - Journal of Machine Learning …, 2017 - jmlr.org

Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a
Markov chain with a stationary distribution. With this perspective, we derive several new …

Save Cite Cited by 724 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Save Cite Cited by 72 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] projecteuclid.org

On sampling from a log-concave density using kinetic Langevin diffusions

AS Dalalyan, L Riou-Durand - 2020 - projecteuclid.org

Langevin diffusion processes and their discretizations are often used for sampling from a
target density. The most convenient framework for assessing the quality of such a sampling …

Save Cite Cited by 185 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

The heavy-tail phenomenon in SGD

M Gurbuzbalaban, U Simsekli… - … Conference on Machine …, 2021 - proceedings.mlr.press

In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …

Save Cite Cited by 140 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Understanding the role of momentum in stochastic gradient methods

I Gitman, H Lang, P Zhang… - Advances in Neural …, 2019 - proceedings.neurips.cc

The use of momentum in stochastic gradient methods has become a widespread practice in
machine learning. Different variants of momentum, including heavy-ball momentum …

Save Cite Cited by 120 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer

Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

Save Cite Cited by 81 Related articles All 6 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Sharp bounds for federated averaging (local sgd) and continuous perspective

MR Glasgow, H Yuan, T Ma - International Conference on …, 2022 - proceedings.mlr.press

Abstract Federated Averaging (FedAvg), also known as Local SGD, is one of the most
popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the …

Save Cite Cited by 61 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

The implicit regularization of stochastic gradient flow for least squares

A Ali, E Dobriban, R Tibshirani - International conference on …, 2020 - proceedings.mlr.press

We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …

Save Cite Cited by 103 Related articles All 10 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Bridging the gap between constant step size stochastic gradient descent and markov chains

Opening the black box of deep neural networks via information

Stochastic gradient descent and its variants in machine learning

Stochastic gradient descent as approximate bayesian inference

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

On sampling from a log-concave density using kinetic Langevin diffusions

The heavy-tail phenomenon in SGD

Understanding the role of momentum in stochastic gradient methods

Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

Sharp bounds for federated averaging (local sgd) and continuous perspective

The implicit regularization of stochastic gradient flow for least squares