- Academic Search

J Fan, C Ma, Y Zhong - Statistical science: a review journal of …, 2020 - pmc.ncbi.nlm.nih.gov

Deep learning has achieved tremendous success in recent years. In simple words, deep
learning uses the composition of many nonlinear functions to model the complex …

Save Cite Cited by 235 Related articles All 14 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press

Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Save Cite Cited by 1430 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press

Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

Save Cite Cited by 1118 Related articles All 6 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] openreview.net

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arxiv preprint arxiv:1810.02054, 2018 - arxiv.org

One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

Save Cite Cited by 854 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Save Cite Cited by 80 Related articles All 10 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Learning single-index models with shallow neural networks

A Bietti, J Bruna, C Sanford… - Advances in Neural …, 2022 - proceedings.neurips.cc

Single-index models are a class of functions given by an unknown univariate``link''function
applied to an unknown one-dimensional projection of the input. These models are …

Save Cite Cited by 89 Related articles All 12 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Benign overfitting in two-layer convolutional neural networks

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

Save Cite Cited by 122 Related articles All 8 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] ieee.org

Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks

S Oymak, M Soltanolkotabi - IEEE Journal on Selected Areas in …, 2020 - ieeexplore.ieee.org

Many modern neural network architectures are trained in an overparameterized regime
where the parameters of the model exceed the size of the training dataset. Sufficiently …

Save Cite Cited by 383 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] ieee.org

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

M Soltanolkotabi, A Javanmard… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org

In this paper, we study the problem of learning a shallow artificial neural network that best
fits a training data set. We study this problem in the over-parameterized regime where the …

Save Cite Cited by 489 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Toward understanding the feature learning process of self-supervised contrastive learning

Z Wen, Y Li - International Conference on Machine Learning, 2021 - proceedings.mlr.press

We formally study how contrastive learning learns the feature representations for neural
networks by investigating its feature learning process. We consider the case where our data …

Save Cite Cited by 152 Related articles All 4 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Learning relus via gradient descent

A selective overview of deep learning

Gradient descent finds global minima of deep neural networks

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

Gradient descent provably optimizes over-parameterized neural networks

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Learning single-index models with shallow neural networks

Benign overfitting in two-layer convolutional neural networks

Toward moderate overparameterization: Global convergence guarantees for training shallow neural networks

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

Toward understanding the feature learning process of self-supervised contrastive learning