Proving the lottery ticket hypothesis: Pruning is all you need
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized
network contains a small subnetwork such that, when trained in isolation, can compete with …
network contains a small subnetwork such that, when trained in isolation, can compete with …
Statistical-query lower bounds via functional gradients
We give the first statistical-query lower bounds for agnostically learning any non-polynomial
activation with respect to Gaussian marginals (eg, ReLU, sigmoid, sign). For the specific …
activation with respect to Gaussian marginals (eg, ReLU, sigmoid, sign). For the specific …
A physics-informed multi-agents model to predict thermo-oxidative/hydrolytic aging of elastomers
This paper introduces a novel physics-informed multi-agents constitutive model to propose
prediction in quasi-static constitutive behavior of cross-linked elastomer and the loss of …
prediction in quasi-static constitutive behavior of cross-linked elastomer and the loss of …
Memory capacity of neural networks with threshold and rectified linear unit activations
R Vershynin - SIAM Journal on Mathematics of Data Science, 2020 - SIAM
Overwhelming theoretical and empirical evidence shows that mildly overparametrized
neural networks---those with more connections than the size of the training data---are often …
neural networks---those with more connections than the size of the training data---are often …
AESPA: Accuracy preserving low-degree polynomial activation for fast private inference
Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party
computation (MPC) and homomorphic encryption, is one of the most prominent techniques …
computation (MPC) and homomorphic encryption, is one of the most prominent techniques …
A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …
in neural net training as it has been broadly observed that, at least empirically, it often leads …
Non-asymptotic approximations of neural networks by Gaussian processes
We study the extent to which wide neural networks may be approximated by Gaussian
processes, when initialized with random weights. It is a well-established fact that as the …
processes, when initialized with random weights. It is a well-established fact that as the …
Digraf: Diffeomorphic graph-adaptive activation function
In this paper, we propose a novel activation function tailored specifically for graph data in
Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible …
Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible …
Effects of nonlinearity and network architecture on the performance of supervised neural networks
The nonlinearity of activation functions used in deep learning models is crucial for the
success of predictive models. Several simple nonlinear functions, including Rectified Linear …
success of predictive models. Several simple nonlinear functions, including Rectified Linear …
Characterizing the spectrum of the NTK via a power series expansion
Under mild conditions on the network initialization we derive a power series expansion for
the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite …
the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite …