- Academic Search

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

M Belkin - Acta Numerica, 2021 - cambridge.org

In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …

Lagre Referanse Sitert av 269 Beslektede artikler Alle 7 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

C Liu, L Zhu, M Belkin - Applied and Computational Harmonic Analysis, 2022 - Elsevier

The success of deep learning is due, to a large extent, to the remarkable effectiveness of
gradient-based optimization methods applied to large neural networks. The purpose of this …

Lagre Referanse Sitert av 293 Beslektede artikler Alle 6 versjoner

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

S Vaswani, F Bach, M Schmidt - The 22nd international …, 2019 - proceedings.mlr.press

Modern machine learning focuses on highly expressive models that are able to fit or
interpolate the data completely, resulting in zero training loss. For such models, we show …

Lagre Referanse Sitert av 371 Beslektede artikler Alle 6 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] thecvf.com

Mixed-privacy forgetting in deep networks

A Golatkar, A Achille, A Ravichandran… - Proceedings of the …, 2021 - openaccess.thecvf.com

We show that the influence of a subset of the training samples can be removed--or"
forgotten"--from the weights of a network trained on large-scale image classification tasks …

Lagre Referanse Sitert av 178 Beslektede artikler Alle 9 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Painless stochastic gradient: Interpolation, line-search, and convergence rates

S Vaswani, A Mishkin, I Laradji… - Advances in neural …, 2019 - proceedings.neurips.cc

Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …

Lagre Referanse Sitert av 250 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Overparameterized nonlinear learning: Gradient descent takes the shortest path?

S Oymak, M Soltanolkotabi - International Conference on …, 2019 - proceedings.mlr.press

Many modern learning tasks involve fitting nonlinear models which are trained in an
overparameterized regime where the parameters of the model exceed the size of the …

Lagre Referanse Sitert av 221 Beslektede artikler Alle 7 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Fine-grained analysis of stability and generalization for stochastic gradient descent

Y Lei, Y Ying - International Conference on Machine …, 2020 - proceedings.mlr.press

Recently there are a considerable amount of work devoted to the study of the algorithmic
stability and generalization for stochastic gradient descent (SGD). However, the existing …

Lagre Referanse Sitert av 149 Beslektede artikler Alle 20 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Faster non-convex federated learning via global and local momentum

R Das, A Acharya, A Hashemi… - Uncertainty in …, 2022 - proceedings.mlr.press

Abstract We propose\texttt {FedGLOMO}, a novel federated learning (FL) algorithm with an
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …

Lagre Referanse Sitert av 97 Beslektede artikler Alle 5 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

The implicit regularization of stochastic gradient flow for least squares

A Ali, E Dobriban, R Tibshirani - International conference on …, 2020 - proceedings.mlr.press

We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …

Lagre Referanse Sitert av 105 Beslektede artikler Alle 12 versjoner HTML-versjon

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On the lower bound of minimizing polyak-łojasiewicz functions

P Yue, C Fang, Z Lin - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press

Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …

Lagre Referanse Sitert av 41 Beslektede artikler Alle 5 versjoner HTML-versjon

Opprett varsel

Referanse

Avansert søk

Lagret i Mitt bibliotek

On exponential convergence of sgd in non-convex over-parametrized learning

Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation

Loss landscapes and optimization in over-parameterized non-linear systems and neural networks

Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron

Mixed-privacy forgetting in deep networks

Painless stochastic gradient: Interpolation, line-search, and convergence rates

Overparameterized nonlinear learning: Gradient descent takes the shortest path?

Fine-grained analysis of stability and generalization for stochastic gradient descent

Faster non-convex federated learning via global and local momentum

The implicit regularization of stochastic gradient flow for least squares

On the lower bound of minimizing polyak-łojasiewicz functions