Opening the black box of deep neural networks via information

R Shwartz-Ziv, N Tishby - arxiv preprint arxiv:1703.00810, 2017 - arxiv.org
Despite their great success, there is still no comprehensive theoretical understanding of
learning with Deep Neural Networks (DNNs) or their inner organization. Previous work …

Stochastic gradient descent and its variants in machine learning

P Netrapalli - Journal of the Indian Institute of Science, 2019 - Springer
Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …

Stochastic gradient descent as approximate bayesian inference

M Stephan, MD Hoffman, DM Blei - Journal of Machine Learning …, 2017 - jmlr.org
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a
Markov chain with a stationary distribution. With this perspective, we derive several new …

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

On sampling from a log-concave density using kinetic Langevin diffusions

AS Dalalyan, L Riou-Durand - 2020 - projecteuclid.org
Langevin diffusion processes and their discretizations are often used for sampling from a
target density. The most convenient framework for assessing the quality of such a sampling …

The heavy-tail phenomenon in SGD

M Gurbuzbalaban, U Simsekli… - … Conference on Machine …, 2021 - proceedings.mlr.press
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …

Understanding the role of momentum in stochastic gradient methods

I Gitman, H Lang, P Zhang… - Advances in Neural …, 2019 - proceedings.neurips.cc
The use of momentum in stochastic gradient methods has become a widespread practice in
machine learning. Different variants of momentum, including heavy-ball momentum …

Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis

S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …

Sharp bounds for federated averaging (local sgd) and continuous perspective

MR Glasgow, H Yuan, T Ma - International Conference on …, 2022 - proceedings.mlr.press
Abstract Federated Averaging (FedAvg), also known as Local SGD, is one of the most
popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the …

The implicit regularization of stochastic gradient flow for least squares

A Ali, E Dobriban, R Tibshirani - International conference on …, 2020 - proceedings.mlr.press
We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …