Opening the black box of deep neural networks via information
R Shwartz-Ziv, N Tishby - arxiv preprint arxiv:1703.00810, 2017 - arxiv.org
Despite their great success, there is still no comprehensive theoretical understanding of
learning with Deep Neural Networks (DNNs) or their inner organization. Previous work …
learning with Deep Neural Networks (DNNs) or their inner organization. Previous work …
Stochastic gradient descent and its variants in machine learning
P Netrapalli - Journal of the Indian Institute of Science, 2019 - Springer
Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …
Stochastic gradient descent as approximate bayesian inference
Stochastic Gradient Descent with a constant learning rate (constant SGD) simulates a
Markov chain with a stationary distribution. With this perspective, we derive several new …
Markov chain with a stationary distribution. With this perspective, we derive several new …
High-dimensional limit theorems for sgd: Effective dynamics and critical scaling
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
On sampling from a log-concave density using kinetic Langevin diffusions
AS Dalalyan, L Riou-Durand - 2020 - projecteuclid.org
Langevin diffusion processes and their discretizations are often used for sampling from a
target density. The most convenient framework for assessing the quality of such a sampling …
target density. The most convenient framework for assessing the quality of such a sampling …
The heavy-tail phenomenon in SGD
In recent years, various notions of capacity and complexity have been proposed for
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
characterizing the generalization properties of stochastic gradient descent (SGD) in deep …
Understanding the role of momentum in stochastic gradient methods
The use of momentum in stochastic gradient methods has become a widespread practice in
machine learning. Different variants of momentum, including heavy-ball momentum …
machine learning. Different variants of momentum, including heavy-ball momentum …
Stochastic gradient descent with noise of machine learning type part i: Discrete time analysis
S Wojtowytsch - Journal of Nonlinear Science, 2023 - Springer
Stochastic gradient descent (SGD) is one of the most popular algorithms in modern machine
learning. The noise encountered in these applications is different from that in many …
learning. The noise encountered in these applications is different from that in many …
Sharp bounds for federated averaging (local sgd) and continuous perspective
Abstract Federated Averaging (FedAvg), also known as Local SGD, is one of the most
popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the …
popular algorithms in Federated Learning (FL). Despite its simplicity and popularity, the …
The implicit regularization of stochastic gradient flow for least squares
We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …
to the fundamental problem of least squares regression. We leverage a continuous-time …