Overview frequency principle/spectral bias in deep learning
Understanding deep learning is increasingly emergent as it penetrates more and more into
industry and science. In recent years, a research line from Fourier analysis sheds light on …
industry and science. In recent years, a research line from Fourier analysis sheds light on …
On lazy training in differentiable programming
In a series of recent theoretical works, it was shown that strongly over-parameterized neural
networks trained with gradient-based methods could converge exponentially fast to zero …
networks trained with gradient-based methods could converge exponentially fast to zero …
The generalization error of random features regression: Precise asymptotics and the double descent curve
Deep learning methods operate in regimes that defy the traditional statistical mindset.
Neural network architectures often contain more parameters than training samples, and are …
Neural network architectures often contain more parameters than training samples, and are …
Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss
Neural networks trained to minimize the logistic (aka cross-entropy) loss with gradient-based
methods are observed to perform well in many supervised classification tasks. Towards …
methods are observed to perform well in many supervised classification tasks. Towards …
The merged-staircase property: a necessary and nearly sufficient condition for sgd learning of sparse functions on two-layer neural networks
It is currently known how to characterize functions that neural networks can learn with SGD
for two extremal parametrizations: neural networks in the linear regime, and neural networks …
for two extremal parametrizations: neural networks in the linear regime, and neural networks …
[HTML][HTML] Landscape and training regimes in deep learning
Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …
Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks
Many modern neural network architectures are trained in an overparameterized regime
where the parameters of the model exceed the size of the training dataset. Sufficiently …
where the parameters of the model exceed the size of the training dataset. Sufficiently …
Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training
In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …
optimization program, in a quest to better understand deep neural networks that are trained …
Linearized two-layers neural networks in high dimension
The Supplementary Material contains the proofs of Theorem 1 (a) in Appendix A, Theorem 1
(b) in Appendix B, Proposition 2 in Appendix C, Theorem 2 (b) in Appendix D and Theorem …
(b) in Appendix B, Proposition 2 in Appendix C, Theorem 2 (b) in Appendix D and Theorem …
High-dimensional limit theorems for sgd: Effective dynamics and critical scaling
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
the high-dimensional regime. We prove limit theorems for the trajectories of summary …