Recent advances in stochastic gradient descent in deep learning
In the age of artificial intelligence, the best approach to handling huge amounts of data is a
tremendously motivating and hard problem. Among machine learning models, stochastic …
tremendously motivating and hard problem. Among machine learning models, stochastic …
Neural collapse: A review on modelling principles and generalization
V Kothapalli - arxiv preprint arxiv:2206.04041, 2022 - arxiv.org
Deep classifier neural networks enter the terminal phase of training (TPT) when training
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
error reaches zero and tend to exhibit intriguing Neural Collapse (NC) properties. Neural …
Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …
reduce the size of neural networks by selectively pruning components. Similarly to their …
Autoformer: Searching transformers for visual recognition
Recently, pure transformer-based models have shown great potentials for vision tasks such
as image classification and detection. However, the design of transformer networks is …
as image classification and detection. However, the design of transformer networks is …
Fedbn: Federated learning on non-iid features via local batch normalization
The emerging paradigm of federated learning (FL) strives to enable collaborative training of
deep models on the network edge without centrally aggregating raw data and hence …
deep models on the network edge without centrally aggregating raw data and hence …
Hidden progress in deep learning: Sgd learns parities near the computational limit
There is mounting evidence of emergent phenomena in the capabilities of deep learning
methods as we scale up datasets, model sizes, and training times. While there are some …
methods as we scale up datasets, model sizes, and training times. While there are some …
Towards understanding ensemble, knowledge distillation and self-distillation in deep learning
We formally study how ensemble of deep learning models can improve test accuracy, and
how the superior performance of ensemble can be distilled into a single model using …
how the superior performance of ensemble can be distilled into a single model using …
Fit without fear: remarkable mathematical phenomena of deep learning through the prism of interpolation
M Belkin - Acta Numerica, 2021 - cambridge.org
In the past decade the mathematical theory of machine learning has lagged far behind the
triumphs of deep neural networks on practical challenges. However, the gap between theory …
triumphs of deep neural networks on practical challenges. However, the gap between theory …
Gradient starvation: A learning proclivity in neural networks
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
Theory of overparametrization in quantum neural networks
The prospect of achieving quantum advantage with quantum neural networks (QNNs) is
exciting. Understanding how QNN properties (for example, the number of parameters M) …
exciting. Understanding how QNN properties (for example, the number of parameters M) …