Overview frequency principle/spectral bias in deep learning

ZQJ Xu, Y Zhang, T Luo - Communications on Applied Mathematics and …, 2024 - Springer
Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …

On lazy training in differentiable programming

L Chizat, E Oyallon, F Bach - Advances in neural …, 2019 - proceedings.neurips.cc
In a series of recent theoretical works, it was shown that strongly over-parameterized neural
networks trained with gradient-based methods could converge exponentially fast to zero …

The generalization error of random features regression: Precise asymptotics and the double descent curve

S Mei, A Montanari - Communications on Pure and Applied …, 2022 - Wiley Online Library
Deep learning methods operate in regimes that defy the traditional statistical mindset.
Neural network architectures often contain more parameters than training samples, and are …

Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss

L Chizat, F Bach - Conference on learning theory, 2020 - proceedings.mlr.press
Neural networks trained to minimize the logistic (aka cross-entropy) loss with gradient-based
methods are observed to perform well in many supervised classification tasks. Towards …

The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks

E Abbe, EB Adsera… - Conference on Learning …, 2022 - proceedings.mlr.press
It is currently known how to characterize functions that neural networks can learn with SGD
for two extremal parametrizations: neural networks in the linear regime, and neural networks …

[HTML][HTML] Landscape and training regimes in deep learning

M Geiger, L Petrini, M Wyart - Physics Reports, 2021 - Elsevier
Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …

Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks

S Oymak, M Soltanolkotabi - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org
Many modern neural network architectures are trained in an overparameterized regime
where the parameters of the model exceed the size of the training dataset. Sufficiently …

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

C Fang, H He, Q Long, WJ Su - Proceedings of the National …, 2021 - National Acad Sciences
In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …

Linearized two-layers neural networks in high dimension

B Ghorbani, S Mei, T Misiakiewicz, A Montanari - 2021 - projecteuclid.org
The Supplementary Material contains the proofs of Theorem 1 (a) in Appendix A, Theorem 1
(b) in Appendix B, Proposition 2 in Appendix C, Theorem 2 (b) in Appendix D and Theorem …

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …