Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …
triumphs of deep neural networks on practical challenges. However, the gap between theory …
Loss landscapes and optimization in over-parameterized non-linear systems and neural networks
The success of deep learning is due, to a large extent, to the remarkable effectiveness of
gradient-based optimization methods applied to large neural networks. The purpose of this …
gradient-based optimization methods applied to large neural networks. The purpose of this …
Fast and faster convergence of sgd for over-parameterized models and an accelerated perceptron
Modern machine learning focuses on highly expressive models that are able to fit or
interpolate the data completely, resulting in zero training loss. For such models, we show …
interpolate the data completely, resulting in zero training loss. For such models, we show …
Mixed-privacy forgetting in deep networks
We show that the influence of a subset of the training samples can be removed--or"
forgotten"--from the weights of a network trained on large-scale image classification tasks …
forgotten"--from the weights of a network trained on large-scale image classification tasks …
Painless stochastic gradient: Interpolation, line-search, and convergence rates
Recent works have shown that stochastic gradient descent (SGD) achieves the fast
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
convergence rates of full-batch gradient descent for over-parameterized models satisfying …
Overparameterized nonlinear learning: Gradient descent takes the shortest path?
Many modern learning tasks involve fitting nonlinear models which are trained in an
overparameterized regime where the parameters of the model exceed the size of the …
overparameterized regime where the parameters of the model exceed the size of the …
Fine-grained analysis of stability and generalization for stochastic gradient descent
Recently there are a considerable amount of work devoted to the study of the algorithmic
stability and generalization for stochastic gradient descent (SGD). However, the existing …
stability and generalization for stochastic gradient descent (SGD). However, the existing …
Faster non-convex federated learning via global and local momentum
Abstract We propose\texttt {FedGLOMO}, a novel federated learning (FL) algorithm with an
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …
The implicit regularization of stochastic gradient flow for least squares
We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …
to the fundamental problem of least squares regression. We leverage a continuous-time …
On the lower bound of minimizing polyak-łojasiewicz functions
Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …
strong convexity but suffices to ensure a global convergence for the Gradient Descent …