- Academic Search

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org

Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Tallenna Viittaa Viittausten määrä 521 Aiheeseen liittyviä artikkeleita Kaikki 13 versiota

[Free GPT-4]
[DeepSeek]

[PDF] ncsu.edu

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Tallenna Viittaa Viittausten määrä 177 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Y Tian, Y Wang, B Chen, SS Du - Advances in Neural …, 2023 - proceedings.neurips.cc

Transformer architecture has shown impressive performance in multiple research domains
and has become the backbone of many neural network models. However, there is limited …

Tallenna Viittaa Viittausten määrä 81 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Understanding self-supervised learning dynamics without contrastive pairs

Y Tian, X Chen, S Ganguli - International Conference on …, 2021 - proceedings.mlr.press

While contrastive approaches of self-supervised learning (SSL) learn representations by
minimizing the distance between two augmented views of the same data point (positive …

Tallenna Viittaa Viittausten määrä 337 Aiheeseen liittyviä artikkeleita Kaikki 7 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Z Allen-Zhu, Y Li - arxiv preprint arxiv:2012.09816, 2020 - arxiv.org

We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …

Tallenna Viittaa Viittausten määrä 459 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … Conference on Machine …, 2019 - proceedings.mlr.press

Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

Tallenna Viittaa Viittausten määrä 1120 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

A convergence theory for deep learning via over-parameterization

Z Allen-Zhu, Y Li, Z Song - International conference on …, 2019 - proceedings.mlr.press

Deep neural networks (DNNs) have demonstrated dominating performance in many fields;
since AlexNet, networks used in practice are going wider and deeper. On the theoretical …

Tallenna Viittaa Viittausten määrä 1709 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press

Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Tallenna Viittaa Viittausten määrä 1432 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning and generalization in overparameterized neural networks, going beyond two layers

Z Allen-Zhu, Y Li, Y Liang - Advances in neural information …, 2019 - proceedings.neurips.cc

Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers
Page 1 Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two …

Tallenna Viittaa Viittausten määrä 904 Aiheeseen liittyviä artikkeleita Kaikki 12 versiota HTML-versio

[Free GPT-4]
[DeepSeek]

[PDF] openreview.net

Gradient descent provably optimizes over-parameterized neural networks

SS Du, X Zhai, B Poczos, A Singh - arxiv preprint arxiv:1810.02054, 2018 - arxiv.org

One of the mysteries in the success of neural networks is randomly initialized first order
methods like gradient descent can achieve zero training loss even though the objective …

Tallenna Viittaa Viittausten määrä 855 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota HTML-versio

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

Globally optimal gradient descent for a convnet with gaussian inputs

Nonconvex optimization meets low-rank matrix factorization: An overview

Optimization for deep learning: An overview

Scan and snap: Understanding training dynamics and token composition in 1-layer transformer

Understanding self-supervised learning dynamics without contrastive pairs

Towards understanding ensemble, knowledge distillation and self-distillation in deep learning

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

A convergence theory for deep learning via over-parameterization

Gradient descent finds global minima of deep neural networks

Learning and generalization in overparameterized neural networks, going beyond two layers

Gradient descent provably optimizes over-parameterized neural networks