A selective overview of deep learning
Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …
learning uses the composition of many nonlinear functions to model the complex …
Gradient descent finds global minima of deep neural networks
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …
objective function being non-convex. The current paper proves gradient descent achieves …
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …
generalize despite being very overparametrized. This paper analyzes training and …
Gradient descent provably optimizes over-parameterized neural networks
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …
methods like gradient descent can achieve zero training loss even though the objective …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
Learning single-index models with shallow neural networks
Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …
applied to an unknown one-dimensional projection of the input. These models are …
Benign overfitting in two-layer convolutional neural networks
Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …
training data, while still achieving a good test performance. This phenomenon is referred to …
Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks
Many modern neural network architectures are trained in an overparameterized regime
where the parameters of the model exceed the size of the training dataset. Sufficiently …
where the parameters of the model exceed the size of the training dataset. Sufficiently …
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
In this paper, we study the problem of learning a shallow artificial neural network that best
fits a training data set. We study this problem in the over-parameterized regime where the …
fits a training data set. We study this problem in the over-parameterized regime where the …
Toward understanding the feature learning process of self-supervised contrastive learning
We formally study how contrastive learning learns the feature representations for neural
networks by investigating its feature learning process. We consider the case where our data …
networks by investigating its feature learning process. We consider the case where our data …