Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
The global landscape of neural networks: An overview
One of the major concerns for neural network training is that the nonconvexity of the
associated loss functions may cause a bad landscape. The recent success of neural …
associated loss functions may cause a bad landscape. The recent success of neural …
Optimization for deep learning: theory and algorithms
R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …
overview of optimization algorithms and theory for training neural networks. First, we discuss …
Mechanistic mode connectivity
We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …
observation that minimizers of neural networks retrieved via training on a dataset are …
What Happens after SGD Reaches Zero Loss?--A Mathematical Framework
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key
challenges in deep learning, especially for overparametrized models, where the local …
challenges in deep learning, especially for overparametrized models, where the local …
Geometry of the loss landscape in overparameterized neural networks: Symmetries and invariances
We study how permutation symmetries in overparameterized multi-layer neural networks
generate 'symmetry-induced'critical points. Assuming a network with $ L $ layers of minimal …
generate 'symmetry-induced'critical points. Assuming a network with $ L $ layers of minimal …
Tight bounds on the smallest eigenvalue of the neural tangent kernel for deep relu networks
A recent line of work has analyzed the theoretical properties of deep neural networks via the
Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been …
Neural Tangent Kernel (NTK). In particular, the smallest eigenvalue of the NTK has been …
Going beyond linear mode connectivity: The layerwise linear feature connectivity
Recent work has revealed many intriguing empirical phenomena in neural network training,
despite the poorly understood and highly complex loss landscapes and training dynamics …
despite the poorly understood and highly complex loss landscapes and training dynamics …
Learning ReLU networks on linearly separable data: Algorithm, optimality, and generalization
Neural networks with rectified linear unit (ReLU) activation functions (aka ReLU networks)
have achieved great empirical success in various domains. Nonetheless, existing results for …
have achieved great empirical success in various domains. Nonetheless, existing results for …
Dart: Diversify-aggregate-repeat training improves generalization of neural networks
Abstract Generalization of Neural Networks is crucial for deploying them safely in the real
world. Common training strategies to improve generalization involve the use of data …
world. Common training strategies to improve generalization involve the use of data …