- Academic Search

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Save Cite Cited by 174 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] thecvf.com

Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning

E **e, L Yao, H Shi, Z Liu, D Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com

Diffusion models have proven to be highly effective in generating high-quality images.
However, adapting large pre-trained diffusion models to new domains remains an open …

Save Cite Cited by 61 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Optimization for deep learning: theory and algorithms

R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org

When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

Save Cite Cited by 251 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

An improved analysis of training over-parameterized deep neural networks

D Zou, Q Gu - Advances in neural information processing …, 2019 - proceedings.neurips.cc

A recent line of research has shown that gradient-based algorithms with random
initialization can converge to the global minima of the training loss for over-parameterized …

Save Cite Cited by 285 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Fast convergence of natural gradient descent for over-parameterized neural networks

G Zhang, J Martens, RB Grosse - Advances in Neural …, 2019 - proceedings.neurips.cc

Natural gradient descent has proven very effective at mitigating the catastrophic effects of
pathological curvature in the objective function, but little is known theoretically about its …

Save Cite Cited by 156 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Learning one-hidden-layer relu networks via gradient descent

X Zhang, Y Yu, L Wang, Q Gu - The 22nd international …, 2019 - proceedings.mlr.press

We study the problem of learning one-hidden-layer neural networks with Rectified Linear
Unit (ReLU) activation function, where the inputs are sampled from standard Gaussian …

Save Cite Cited by 158 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] aaai.org

Generalization error bounds of gradient descent for learning over-parameterized deep relu networks

Y Cao, Q Gu - Proceedings of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org

Empirical studies show that gradient-based methods can learn deep neural networks
(DNNs) with very good generalization performance in the over-parameterization regime …

Save Cite Cited by 135 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

From symmetry to geometry: Tractable nonconvex problems

Y Zhang, Q Qu, J Wright - arxiv preprint arxiv:2007.06753, 2020 - arxiv.org

As science and engineering have become increasingly data-driven, the role of optimization
has expanded to touch almost every stage of the data analysis pipeline, from signal and …

Save Cite Cited by 62 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Provably learning a multi-head attention layer

S Chen, Y Li - arxiv preprint arxiv:2402.04084, 2024 - arxiv.org

The multi-head attention layer is one of the key components of the transformer architecture
that sets it apart from traditional feed-forward models. Given a sequence length $ k …

Save Cite Cited by 17 Related articles All 2 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] arxiv.org

Learning deep relu networks is fixed-parameter tractable

S Chen, AR Klivans, R Meka - 2021 IEEE 62nd Annual …, 2022 - ieeexplore.ieee.org

We consider the problem of learning an unknown ReLU network with respect to Gaussian
inputs and obtain the first nontrivial results for networks of depth more than two. We give an …

Save Cite Cited by 45 Related articles All 6 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Learning one-hidden-layer neural networks under general input distributions

Optimization for deep learning: An overview

Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning

Optimization for deep learning: theory and algorithms

An improved analysis of training over-parameterized deep neural networks

Fast convergence of natural gradient descent for over-parameterized neural networks

Learning one-hidden-layer relu networks via gradient descent

Generalization error bounds of gradient descent for learning over-parameterized deep relu networks

From symmetry to geometry: Tractable nonconvex problems

Provably learning a multi-head attention layer

Learning deep relu networks is fixed-parameter tractable