Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
The global landscape of neural networks: An overview
One of the major concerns for neural network training is that the nonconvexity of the
associated loss functions may cause a bad landscape. The recent success of neural …
associated loss functions may cause a bad landscape. The recent success of neural …
On the opportunities and risks of foundation models
AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …
Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training
In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …
optimization program, in a quest to better understand deep neural networks that are trained …
Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel
In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …
Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities
Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …
that does not require the collection of raw training data and does not require expensive …
Optimization for deep learning: theory and algorithms
R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …
overview of optimization algorithms and theory for training neural networks. First, we discuss …
Mechanistic mode connectivity
We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …
observation that minimizers of neural networks retrieved via training on a dataset are …
What Happens after SGD Reaches Zero Loss?--A Mathematical Framework
Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key
challenges in deep learning, especially for overparametrized models, where the local …
challenges in deep learning, especially for overparametrized models, where the local …
Re-basin via implicit sinkhorn differentiation
The recent emergence of new algorithms for permuting models into functionally equivalent
regions of the solution space has shed some light on the complexity of error surfaces and …
regions of the solution space has shed some light on the complexity of error surfaces and …