Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning

E **e, L Yao, H Shi, Z Liu, D Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Diffusion models have proven to be highly effective in generating high-quality images.
However, adapting large pre-trained diffusion models to new domains remains an open …

Optimization for deep learning: theory and algorithms

R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

An improved analysis of training over-parameterized deep neural networks

D Zou, Q Gu - Advances in neural information processing …, 2019 - proceedings.neurips.cc
A recent line of research has shown that gradient-based algorithms with random
initialization can converge to the global minima of the training loss for over-parameterized …

Fast convergence of natural gradient descent for over-parameterized neural networks

G Zhang, J Martens, RB Grosse - Advances in Neural …, 2019 - proceedings.neurips.cc
Natural gradient descent has proven very effective at mitigating the catastrophic effects of
pathological curvature in the objective function, but little is known theoretically about its …

Learning one-hidden-layer relu networks via gradient descent

X Zhang, Y Yu, L Wang, Q Gu - The 22nd international …, 2019 - proceedings.mlr.press
We study the problem of learning one-hidden-layer neural networks with Rectified Linear
Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian …

Generalization error bounds of gradient descent for learning over-parameterized deep relu networks

Y Cao, Q Gu - Proceedings of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
Empirical studies show that gradient-based methods can learn deep neural networks
(DNNs) with very good generalization performance in the over-parameterization regime …

From symmetry to geometry: Tractable nonconvex problems

Y Zhang, Q Qu, J Wright - arxiv preprint arxiv:2007.06753, 2020 - arxiv.org
As science and engineering have become increasingly data-driven, the role of optimization
has expanded to touch almost every stage of the data analysis pipeline, from signal and …

Provably learning a multi-head attention layer

S Chen, Y Li - arxiv preprint arxiv:2402.04084, 2024 - arxiv.org
The multi-head attention layer is one of the key components of the transformer architecture
that sets it apart from traditional feed-forward models. Given a sequence length $ k …

Learning deep relu networks is fixed-parameter tractable

S Chen, AR Klivans, R Meka - 2021 IEEE 62nd Annual …, 2022 - ieeexplore.ieee.org
We consider the problem of learning an unknown ReLU network with respect to Gaussian
inputs and obtain the first nontrivial results for networks of depth more than two. We give an …