Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
Scan and snap: Understanding training dynamics and token composition in 1-layer transformer
Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …
and has become the backbone of many neural network models. However, there is limited …
Understanding self-supervised learning dynamics without contrastive pairs
While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …
minimizing the distance between two augmented views of the same data point (positive …
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …
how the superior performance of ensemble can be distilled into a single model using …
Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …
generalize despite being very overparametrized. This paper analyzes training and …
A convergence theory for deep learning via over-parameterization
Deep neural networks (DNNs) have demonstrated dominating performance in many fields;
since AlexNet, networks used in practice are going wider and deeper. On the theoretical …
since AlexNet, networks used in practice are going wider and deeper. On the theoretical …
Gradient descent finds global minima of deep neural networks
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …
objective function being non-convex. The current paper proves gradient descent achieves …
Learning and generalization in overparameterized neural networks, going beyond two layers
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …
Gradient descent provably optimizes over-parameterized neural networks
One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …
methods like gradient descent can achieve zero training loss even though the objective …