Neural collapse: A review on modelling principles and generalization

V Kothapalli - arxiv preprint arxiv:2206.04041, 2022 - arxiv.org
Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …

Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Ties-merging: Resolving interference when merging models

P Yadav, D Tam, L Choshen… - Advances in Neural …, 2023 - proceedings.neurips.cc
Transfer learning–ie, further fine-tuning a pre-trained model on a downstream task–can
confer significant advantages, including improved downstream performance, faster …

The role of permutation invariance in linear mode connectivity of neural networks

R Entezari, H Sedghi, O Saukh… - arxiv preprint arxiv …, 2021 - arxiv.org
In this paper, we conjecture that if the permutation invariance of neural networks is taken into
account, SGD solutions will likely have no barrier in the linear interpolation between them …

S4l: Self-supervised semi-supervised learning

X Zhai, A Oliver, A Kolesnikov… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
This work tackles the problem of semi-supervised learning of image classifiers. Our main
insight is that the field of semi-supervised learning can benefit from the quickly advancing …

Linear mode connectivity and the lottery ticket hypothesis

J Frankle, GK Dziugaite, D Roy… - … on Machine Learning, 2020 - proceedings.mlr.press
We study whether a neural network optimizes to the same, linearly connected minimum
under different samples of SGD noise (eg, random data order and augmentation). We find …

Fine-grained analysis of optimization and generalization for overparameterized two-layer neural networks

S Arora, S Du, W Hu, Z Li… - … conference on machine …, 2019 - proceedings.mlr.press
Recent works have cast some light on the mystery of why deep nets fit any data and
generalize despite being very overparametrized. This paper analyzes training and …

The modern mathematics of deep learning

J Berner, P Grohs, G Kutyniok… - arxiv preprint arxiv …, 2021 - cambridge.org
We describe the new field of the mathematical analysis of deep learning. This field emerged
around a list of research questions that were not answered within the classical framework of …

Gradient descent finds global minima of deep neural networks

S Du, J Lee, H Li, L Wang… - … conference on machine …, 2019 - proceedings.mlr.press
Gradient descent finds a global minimum in training deep neural networks despite the
objective function being non-convex. The current paper proves gradient descent achieves …

Zipit! merging models from different tasks without training

G Stoica, D Bolya, J Bjorner, P Ramesh… - arxiv preprint arxiv …, 2023 - arxiv.org
Typical deep visual recognition models are capable of performing the one task they were
trained on. In this paper, we tackle the extremely difficult problem of combining completely …