Proving the lottery ticket hypothesis: Pruning is all you need

E Malach, G Yehudai… - International …, 2020 - proceedings.mlr.press
The lottery ticket hypothesis (Frankle and Carbin, 2018), states that a randomly-initialized
network contains a small subnetwork such that, when trained in isolation, can compete with …

Statistical-query lower bounds via functional gradients

S Goel, A Gollakota, A Klivans - Advances in Neural …, 2020 - proceedings.neurips.cc
We give the first statistical-query lower bounds for agnostically learning any non-polynomial
activation with respect to Gaussian marginals (eg, ReLU, sigmoid, sign). For the specific …

A physics-informed multi-agents model to predict thermo-oxidative/hydrolytic aging of elastomers

A Ghaderi, V Morovati, Y Chen, R Dargazany - International Journal of …, 2022 - Elsevier
This paper introduces a novel physics-informed multi-agents constitutive model to propose
prediction in quasi-static constitutive behavior of cross-linked elastomer and the loss of …

Memory capacity of neural networks with threshold and rectified linear unit activations

R Vershynin - SIAM Journal on Mathematics of Data Science, 2020 - SIAM
Overwhelming theoretical and empirical evidence shows that mildly overparametrized
neural networks---those with more connections than the size of the training data---are often …

AESPA: Accuracy preserving low-degree polynomial activation for fast private inference

J Park, MJ Kim, W Jung, JH Ahn - arxiv preprint arxiv:2201.06699, 2022 - arxiv.org
Hybrid private inference (PI) protocol, which synergistically utilizes both multi-party
computation (MPC) and homomorphic encryption, is one of the most prominent techniques …

A modular analysis of provable acceleration via polyak's momentum: Training a wide relu network and a deep linear network

JK Wang, CH Lin, JD Abernethy - … Conference on Machine …, 2021 - proceedings.mlr.press
Incorporating a so-called “momentum” dynamic in gradient descent methods is widely used
in neural net training as it has been broadly observed that, at least empirically, it often leads …

Non-asymptotic approximations of neural networks by Gaussian processes

R Eldan, D Mikulincer… - Conference on Learning …, 2021 - proceedings.mlr.press
We study the extent to which wide neural networks may be approximated by Gaussian
processes, when initialized with random weights. It is a well-established fact that as the …

Digraf: Diffeomorphic graph-adaptive activation function

KSI Mantri, X Wang, CB Schönlieb, B Ribeiro… - arxiv preprint arxiv …, 2024 - arxiv.org
In this paper, we propose a novel activation function tailored specifically for graph data in
Graph Neural Networks (GNNs). Motivated by the need for graph-adaptive and flexible …

Effects of nonlinearity and network architecture on the performance of supervised neural networks

N Kulathunga, NR Ranasinghe, D Vrinceanu… - Algorithms, 2021 - mdpi.com
The nonlinearity of activation functions used in deep learning models is crucial for the
success of predictive models. Several simple nonlinear functions, including Rectified Linear …

Characterizing the spectrum of the NTK via a power series expansion

M Murray, H **, B Bowman, G Montufar - arxiv preprint arxiv:2211.07844, 2022 - arxiv.org
Under mild conditions on the network initialization we derive a power series expansion for
the Neural Tangent Kernel (NTK) of arbitrarily deep feedforward networks in the infinite …