Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arxiv preprint arxiv:2009.10713, 2020 - arxiv.org
The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

A unifying view on implicit bias in training linear neural networks

C Yun, S Krishnan, H Mobahi - arxiv preprint arxiv:2010.02501, 2020 - arxiv.org
We study the implicit bias of gradient flow (ie, gradient descent with infinitesimal step size)
on linear neural network training. We propose a tensor formulation of neural networks that …

A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameterization from depth

Y Lu, C Ma, Y Lu, J Lu, L Ying - International Conference on …, 2020 - proceedings.mlr.press
Training deep neural networks with stochastic gradient descent (SGD) can often achieve
zero training loss on real-world tasks although the optimization landscape is known to be …

Continuous vs. discrete optimization of deep neural networks

O Elkabetz, N Cohen - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Existing analyses of optimization in deep learning are either continuous, focusing on
(variants of) gradient flow, or discrete, directly treating (variants of) gradient descent …

Implicit regularization of deep residual networks towards neural ODEs

P Marion, YH Wu, ME Sander, G Biau - arxiv preprint arxiv:2309.01213, 2023 - arxiv.org
Residual neural networks are state-of-the-art deep learning models. Their continuous-depth
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …

Wide neural networks as gaussian processes: Lessons from deep equilibrium models

T Gao, X Huo, H Liu, H Gao - Advances in Neural …, 2023 - proceedings.neurips.cc
Neural networks with wide layers have attracted significant attention due to their
equivalence to Gaussian processes, enabling perfect fitting of training data while …

On the global convergence of training deep linear resnets

D Zou, PM Long, Q Gu - arxiv preprint arxiv:2003.01094, 2020 - arxiv.org
We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD)
for training $ L $-hidden-layer linear residual networks (ResNets). We prove that for training …

On the convergence of gradient flow on multi-layer linear models

H Min, R Vidal, E Mallada - International Conference on …, 2023 - proceedings.mlr.press
In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with
a loss function of the form $ f (W_1W_2\cdots W_L) $. We show that when $ f $ satisfies the …

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

Convergence of gradient descent for learning linear neural networks

GM Nguegnang, H Rauhut, U Terstiege - Advances in Continuous and …, 2024 - Springer
We study the convergence properties of gradient descent for training deep linear neural
networks, ie, deep matrix factorizations, by extending a previous analysis for the related …