Pixelated butterfly: Simple and efficient sparse training for neural network models

T Dao, B Chen, K Liang, J Yang, Z Song… - arxiv preprint arxiv …, 2021 - arxiv.org
Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …

Width and depth limits commute in residual networks

S Hayou, G Yang - International Conference on Machine …, 2023 - proceedings.mlr.press
We show that taking the width and depth to infinity in a deep neural network with skip
connections, when branches are scaled by $1/\sqrt {depth} $, result in the same covariance …

On the infinite-depth limit of finite-width neural networks

S Hayou - Transactions on Machine Learning Research, 2022 - openreview.net
In this paper, we study the infinite-depth limit of finite-width residual neural networks with
random Gaussian weights. With proper scaling, we show that by fixing the width and taking …

Infinitely deep neural networks as diffusion processes

S Peluchetti, S Favaro - International Conference on Artificial …, 2020 - proceedings.mlr.press
When the parameters are independently and identically distributed (initialized) neural
networks exhibit undesirable properties that emerge as the number of layers increases, eg a …

Neural spectrum alignment: Empirical study

D Kopitkov, V Indelman - Artificial Neural Networks and Machine Learning …, 2020 - Springer
Expressiveness and generalization of deep models was recently addressed via the
connection between neural networks (NNs) and kernel learning, where first-order dynamics …

Commutative Width and Depth Scaling in Deep Neural Networks

S Hayou - arxiv preprint arxiv:2310.01683, 2023 - arxiv.org
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about
commutativity of infinite width and depth limits in deep neural networks. Our aim is to …

Doubly infinite residual neural networks: a diffusion process approach

S Peluchetti, S Favaro - Journal of Machine Learning Research, 2021 - jmlr.org
Modern neural networks featuring a large number of layers (depth) and units per layer
(width) have achieved a remarkable performance across many domains. While there exists …

Theory of Deep Learning: Neural Tangent Kernel and Beyond

AU Jacot-Guillarmod - 2022 - infoscience.epfl.ch
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that
previously appeared impossible, such as human-level object recognition, text synthesis …

[PDF][PDF] Doubly infinite residual networks: a diffusion process approach

S Peluchetti, S Favaro - stat, 2020 - researchgate.net
When neural network's parameters are initialized as iid, neural networks exhibit undesirable
forward and backward properties as the number of layers increases, eg, vanishing …

Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization

M Nonnenmacher, D Reeb, I Steinwart - openreview.net
The recently developed link between strongly overparametrized neural networks (NNs) and
kernel methods has opened a new way to understand puzzling features of NNs, such as …