- Academic Search

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

C Ma, S Wojtowytsch, L Wu - arxiv preprint arxiv:2009.10713, 2020 - arxiv.org

The purpose of this article is to review the achievements made in the last few years towards
the understanding of the reasons behind the success and subtleties of neural network …

Spara Citera Citerat av 135 Relaterade artiklar Alla 3 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A unifying view on implicit bias in training linear neural networks

C Yun, S Krishnan, H Mobahi - arxiv preprint arxiv:2010.02501, 2020 - arxiv.org

We study the implicit bias of gradient flow (ie, gradient descent with infinitesimal step size)
on linear neural network training. We propose a tensor formulation of neural networks that …

Spara Citera Citerat av 94 Relaterade artiklar Alla 8 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameterization from depth

Y Lu, C Ma, Y Lu, J Lu, L Ying - International Conference on …, 2020 - proceedings.mlr.press

Training deep neural networks with stochastic gradient descent (SGD) can often achieve
zero training loss on real-world tasks although the optimization landscape is known to be …

Spara Citera Citerat av 104 Relaterade artiklar Alla 11 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Continuous vs. discrete optimization of deep neural networks

O Elkabetz, N Cohen - Advances in Neural Information …, 2021 - proceedings.neurips.cc

Existing analyses of optimization in deep learning are either continuous, focusing on
(variants of) gradient flow, or discrete, directly treating (variants of) gradient descent …

Spara Citera Citerat av 46 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Implicit regularization of deep residual networks towards neural ODEs

P Marion, YH Wu, ME Sander, G Biau - arxiv preprint arxiv:2309.01213, 2023 - arxiv.org

Residual neural networks are state-of-the-art deep learning models. Their continuous-depth
analog, neural ordinary differential equations (ODEs), are also widely used. Despite their …

Spara Citera Citerat av 18 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Wide neural networks as gaussian processes: Lessons from deep equilibrium models

T Gao, X Huo, H Liu, H Gao - Advances in Neural …, 2023 - proceedings.neurips.cc

Neural networks with wide layers have attracted significant attention due to their
equivalence to Gaussian processes, enabling perfect fitting of training data while …

Spara Citera Citerat av 7 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

On the global convergence of training deep linear resnets

D Zou, PM Long, Q Gu - arxiv preprint arxiv:2003.01094, 2020 - arxiv.org

We study the convergence of gradient descent (GD) and stochastic gradient descent (SGD)
for training $ L $-hidden-layer linear residual networks (ResNets). We prove that for training …

Spara Citera Citerat av 44 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

On the convergence of gradient flow on multi-layer linear models

H Min, R Vidal, E Mallada - International Conference on …, 2023 - proceedings.mlr.press

In this paper, we analyze the convergence of gradient flow on a multi-layer linear model with
a loss function of the form $ f (W_1W_2\cdots W_L) $. We show that when $ f $ satisfies the …

Spara Citera Citerat av 11 Relaterade artiklar Alla 7 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press

Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

Spara Citera Citerat av 29 Relaterade artiklar Alla 5 versionerna Se som HTML-version

[Free GPT-4]
[DeepSeek]

[PDF] springer.com

Convergence of gradient descent for learning linear neural networks

GM Nguegnang, H Rauhut, U Terstiege - Advances in Continuous and …, 2024 - Springer

We study the convergence properties of gradient descent for training deep linear neural
networks, ie, deep matrix factorizations, by extending a previous analysis for the related …

Spara Citera Citerat av 24 Relaterade artiklar Alla 10 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Global convergence of gradient descent for deep linear residual networks

Towards a mathematical understanding of neural network-based machine learning: what we know and what we don't

A unifying view on implicit bias in training linear neural networks

A mean field analysis of deep resnet and beyond: Towards provably optimization via overparameterization from depth

Continuous vs. discrete optimization of deep neural networks

Implicit regularization of deep residual networks towards neural ODEs

Wide neural networks as gaussian processes: Lessons from deep equilibrium models

On the global convergence of training deep linear resnets

On the convergence of gradient flow on multi-layer linear models

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

Convergence of gradient descent for learning linear neural networks