Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't
The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …
the understanding of the reasons behind the success and subtleties of neural network …
A unifying view on implicit bias in training linear neural networks
We study the implicit bias of gradient flow (ie, gradient descent with infinitesimal step size)
on linear neural network training. We propose a tensor formulation of neural networks that …
on linear neural network training. We propose a tensor formulation of neural networks that …
A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameterization from depth
Training deep neural networks with stochastic gradient descent (SGD) can often achieve
zero training loss on real-world tasks although the optimization landscape is known to be …
zero training loss on real-world tasks although the optimization landscape is known to be …
Continuous vs. discrete optimization of deep neural networks
Existing analyses of optimization in deep learning are either continuous, focusing on
(variants of) gradient flow, or discrete, directly treating (variants of) gradient descent …
(variants of) gradient flow, or discrete, directly treating (variants of) gradient descent …
Implicit regularization of deep residual networks towards neural ODEs
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …
Wide neural networks as gaussian processes: Lessons from deep equilibrium models
Neural networks with wide layers have attracted significant attention due to their
equivalence to Gaussian processes, enabling perfect fitting of training data while …
equivalence to Gaussian processes, enabling perfect fitting of training data while …
On the global convergence of training deep linear resnets
We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD)
for training $ L $-hidden-layer linear residual networks (ResNets). We prove that for training …
for training $ L $-hidden-layer linear residual networks (ResNets). We prove that for training …
On the convergence of gradient flow on multi-layer linear models
In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with
a loss function of the form $ f (W_1W_2\cdots W_L) $. We show that when $ f $ satisfies the …
a loss function of the form $ f (W_1W_2\cdots W_L) $. We show that when $ f $ satisfies the …
A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …
in neural net training as it has been broadly observed that, at least empirically, it often leads …
Convergence of gradient descent for learning linear neural networks
We study the convergence properties of gradient descent for training deep linear neural
networks, ie, deep matrix factorizations, by extending a previous analysis for the related …
networks, ie, deep matrix factorizations, by extending a previous analysis for the related …