- Academic Search

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer

Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Speichern Zitieren Zitiert von: 174 Ähnliche Artikel Alle 7 Versionen

[Free GPT-4]

[PDF] arxiv.org

The global landscape of neural networks: An overview

R Sun, D Li, S Liang, T Ding… - IEEE Signal Processing …, 2020 - ieeexplore.ieee.org

One of the major concerns for neural network training is that the nonconvexity of the
associated loss functions may cause a bad landscape. The recent success of neural …

Speichern Zitieren Zitiert von: 105 Ähnliche Artikel Alle 4 Versionen

[Free GPT-4]

[PDF] arxiv.org

On the opportunities and risks of foundation models

R Bommasani, DA Hudson, E Adeli, R Altman… - arxiv preprint arxiv …, 2021 - arxiv.org

AI is undergoing a paradigm shift with the rise of models (eg, BERT, DALL-E, GPT-3) that are
trained on broad data at scale and are adaptable to a wide range of downstream tasks. We …

Speichern Zitieren Zitiert von: 4676 Ähnliche Artikel Alle 2 Versionen HTML-Version

[Free GPT-4]

[PDF] pnas.org Full View

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

C Fang, H He, Q Long, WJ Su - Proceedings of the National …, 2021 - National Acad Sciences

In this paper, we introduce the Layer-Peeled Model, a nonconvex, yet analytically tractable,
optimization program, in a quest to better understand deep neural networks that are trained …

Speichern Zitieren Zitiert von: 180 Ähnliche Artikel Alle 10 Versionen

[Free GPT-4]

[PDF] neurips.cc

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

S Fort, GK Dziugaite, M Paul… - Advances in …, 2020 - proceedings.neurips.cc

In suitably initialized wide networks, small learning rates transform deep neural networks
(DNNs) into neural tangent kernel (NTK) machines, whose training dynamics is well …

Speichern Zitieren Zitiert von: 195 Ähnliche Artikel Alle 6 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

E Yang, L Shen, G Guo, X Wang, X Cao… - arxiv preprint arxiv …, 2024 - arxiv.org

Model merging is an efficient empowerment technique in the machine learning community
that does not require the collection of raw training data and does not require expensive …

Speichern Zitieren Zitiert von: 42 Ähnliche Artikel HTML-Version

[Free GPT-4]

[PDF] arxiv.org

Optimization for deep learning: theory and algorithms

R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org

When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

Speichern Zitieren Zitiert von: 251 Ähnliche Artikel Alle 4 Versionen HTML-Version

[Free GPT-4]

[PDF] mlr.press

Mechanistic mode connectivity

ES Lubana, EJ Bigelow, RP Dick… - International …, 2023 - proceedings.mlr.press

We study neural network loss landscapes through the lens of mode connectivity, the
observation that minimizers of neural networks retrieved via training on a dataset are …

Speichern Zitieren Zitiert von: 48 Ähnliche Artikel Alle 9 Versionen HTML-Version

[Free GPT-4]

[PDF] arxiv.org

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework

Z Li, T Wang, S Arora - arxiv preprint arxiv:2110.06914, 2021 - arxiv.org

Understanding the implicit bias of Stochastic Gradient Descent (SGD) is one of the key
challenges in deep learning, especially for overparametrized models, where the local …

Speichern Zitieren Zitiert von: 111 Ähnliche Artikel Alle 7 Versionen HTML-Version

[Free GPT-4]

[PDF] thecvf.com

Re-basin via implicit sinkhorn differentiation

FAG Peña, HR Medeiros, T Dubail… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent emergence of new algorithms for permuting models into functionally equivalent
regions of the solution space has shed some light on the complexity of error surfaces and …

Speichern Zitieren Zitiert von: 38 Ähnliche Artikel Alle 10 Versionen HTML-Version

Alert erstellen

Zitieren

Erweiterte Suche

In „Meine Bibliothek“ gespeichert

Explaining landscape connectivity of low-cost solutions for multilayer nets

Optimization for deep learning: An overview

The global landscape of neural networks: An overview

On the opportunities and risks of foundation models

Exploring deep neural networks via layer-peeled model: Minority collapse in imbalanced training

Deep learning versus kernel learning: an empirical study of loss landscape geometry and the time evolution of the neural tangent kernel

Model merging in llms, mllms, and beyond: Methods, theories, applications and opportunities

Optimization for deep learning: theory and algorithms

Mechanistic mode connectivity

What Happens after SGD Reaches Zero Loss?--A Mathematical Framework

Re-basin via implicit sinkhorn differentiation