Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

C Liu, L Zhu, M Belkin - Applied and Computational Harmonic Analysis, 2022 - Elsevier
The success of deep learning is due, to a large extent, to the remarkable effectiveness of
gradient-based optimization methods applied to large neural networks. The purpose of this …

Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

S Vaswani, F Bach, M Schmidt - The 22nd international …, 2019 - proceedings.mlr.press
Modern machine learning focuses on highly expressive models that are able to fit or
interpolate the data completely, resulting in zero training loss. For such models, we show …

Mixed-privacy forgetting in deep networks

A Golatkar, A Achille, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com
We show that the influence of a subset of the training samples can be removed--or"
forgotten"--from the weights of a network trained on large-scale image classification tasks …

Painless stochastic gradient: Interpolation, line-search, and convergence rates

S Vaswani, A Mishkin, I Laradji… - Advances in neural …, 2019 - proceedings.neurips.cc
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …

Overparameterized nonlinear learning: Gradient descent takes the shortest path?

S Oymak, M Soltanolkotabi - International Conference on …, 2019 - proceedings.mlr.press
Many modern learning tasks involve fitting nonlinear models which are trained in an
overparameterized regime where the parameters of the model exceed the size of the …

Fine-grained analysis of stability and generalization for stochastic gradient descent

Y Lei, Y Ying - International Conference on Machine …, 2020 - proceedings.mlr.press
Recently there are a considerable amount of work devoted to the study of the algorithmic
stability and generalization for stochastic gradient descent (SGD). However, the existing …

Faster non-convex federated learning via global and local momentum

R Das, A Acharya, A Hashemi… - Uncertainty in …, 2022 - proceedings.mlr.press
Abstract We propose\texttt {FedGLOMO}, a novel federated learning (FL) algorithm with an
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …

The implicit regularization of stochastic gradient flow for least squares

A Ali, E Dobriban, R Tibshirani - International conference on …, 2020 - proceedings.mlr.press
We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …

On the lower bound of minimizing polyak-łojasiewicz functions

P Yue, C Fang, Z Lin - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …