How neural networks extrapolate: From feedforward to graph neural networks
We study how neural networks trained by gradient descent extrapolate, ie, what they learn
outside the support of the training distribution. Previous works report mixed empirical results …
outside the support of the training distribution. Previous works report mixed empirical results …
[HTML][HTML] High-dimensional dynamics of generalization error in neural networks
We perform an analysis of the average generalization dynamics of large neural networks
trained using gradient descent. We study the practically-relevant “high-dimensional” regime …
trained using gradient descent. We study the practically-relevant “high-dimensional” regime …
Gradient-based feature learning under structured data
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …
single index models, ie functions that depend on a 1-dimensional projection of the input …
Rethinking bias-variance trade-off for generalization of neural networks
The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …
Diffusionshield: A watermark for copyright protection against generative diffusion models
Recently, Generative Diffusion Models (GDMs) have showcased their remarkable
capabilities in learning and generating images. A large community of GDMs has naturally …
capabilities in learning and generating images. A large community of GDMs has naturally …
Random features for kernel approximation: A survey on algorithms, theory, and beyond
The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …
Universality laws for high-dimensional learning with random features
We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …
terms of training and generalization errors, a random feature model with a nonlinear …
On the Optimal Weighted Regularization in Overparameterized Linear Regression
We consider the linear model $\vy=\vX\vbeta_ {\star}+\vepsilon $ with $\vX\in\mathbb
{R}^{n\times p} $ in the overparameterized regime $ p> n $. We estimate $\vbeta_ {\star} …
{R}^{n\times p} $ in the overparameterized regime $ p> n $. We estimate $\vbeta_ {\star} …
Neural networks efficiently learn low-dimensional representations with sgd
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
The neural covariance SDE: Shaped infinite depth-and-width networks at initialization
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian,
given a random covariance matrix defined by the penultimate layer. In this work, we study …
given a random covariance matrix defined by the penultimate layer. In this work, we study …