Pixelated butterfly: Simple and efficient sparse training for neural network models
Overparameterized neural networks generalize well but are expensive to train. Ideally, one
would like to reduce their computational cost while retaining their generalization benefits …
would like to reduce their computational cost while retaining their generalization benefits …
Width and depth limits commute in residual networks
We show that taking the width and depth to infinity in a deep neural network with skip
connections, when branches are scaled by $1/\sqrt {depth} $, result in the same covariance …
connections, when branches are scaled by $1/\sqrt {depth} $, result in the same covariance …
On the infinite-depth limit of finite-width neural networks
S Hayou - Transactions on Machine Learning Research, 2022 - openreview.net
In this paper, we study the infinite-depth limit of finite-width residual neural networks with
random Gaussian weights. With proper scaling, we show that by fixing the width and taking …
random Gaussian weights. With proper scaling, we show that by fixing the width and taking …
Infinitely deep neural networks as diffusion processes
When the parameters are independently and identically distributed (initialized) neural
networks exhibit undesirable properties that emerge as the number of layers increases, eg a …
networks exhibit undesirable properties that emerge as the number of layers increases, eg a …
Neural spectrum alignment: Empirical study
Expressiveness and generalization of deep models was recently addressed via the
connection between neural networks (NNs) and kernel learning, where first-order dynamics …
connection between neural networks (NNs) and kernel learning, where first-order dynamics …
Commutative Width and Depth Scaling in Deep Neural Networks
S Hayou - arxiv preprint arxiv:2310.01683, 2023 - arxiv.org
This paper is the second in the series Commutative Scaling of Width and Depth (WD) about
commutativity of infinite width and depth limits in deep neural networks. Our aim is to …
commutativity of infinite width and depth limits in deep neural networks. Our aim is to …
Doubly infinite residual neural networks: a diffusion process approach
Modern neural networks featuring a large number of layers (depth) and units per layer
(width) have achieved a remarkable performance across many domains. While there exists …
(width) have achieved a remarkable performance across many domains. While there exists …
Theory of Deep Learning: Neural Tangent Kernel and Beyond
AU Jacot-Guillarmod - 2022 - infoscience.epfl.ch
In the recent years, Deep Neural Networks (DNNs) have managed to succeed at tasks that
previously appeared impossible, such as human-level object recognition, text synthesis …
previously appeared impossible, such as human-level object recognition, text synthesis …
[PDF][PDF] Doubly infinite residual networks: a diffusion process approach
When neural network's parameters are initialized as iid, neural networks exhibit undesirable
forward and backward properties as the number of layers increases, eg, vanishing …
forward and backward properties as the number of layers increases, eg, vanishing …
Wide Neural Networks are Interpolating Kernel Methods: Impact of Initialization on Generalization
The recently developed link between strongly overparametrized neural networks (NNs) and
kernel methods has opened a new way to understand puzzling features of NNs, such as …
kernel methods has opened a new way to understand puzzling features of NNs, such as …