How neural networks extrapolate: From feedforward to graph neural networks

K Xu, M Zhang, J Li, SS Du, K Kawarabayashi… - arxiv preprint arxiv …, 2020 - arxiv.org
We study how neural networks trained by gradient descent extrapolate, ie, what they learn
outside the support of the training distribution. Previous works report mixed empirical results …

[HTML][HTML] High-dimensional dynamics of generalization error in neural networks

MS Advani, AM Saxe, H Sompolinsky - Neural Networks, 2020 - Elsevier
We perform an analysis of the average generalization dynamics of large neural networks
trained using gradient descent. We study the practically-relevant “high-dimensional” regime …

Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

Rethinking bias-variance trade-off for generalization of neural networks

Z Yang, Y Yu, C You, J Steinhardt… - … on Machine Learning, 2020 - proceedings.mlr.press
The classical bias-variance trade-off predicts that bias decreases and variance increase with
model complexity, leading to a U-shaped risk curve. Recent work calls this into question for …

Diffusionshield: A watermark for copyright protection against generative diffusion models

Y Cui, J Ren, H Xu, P He, H Liu, L Sun, Y **ng… - arxiv preprint arxiv …, 2023 - arxiv.org
Recently, Generative Diffusion Models (GDMs) have showcased their remarkable
capabilities in learning and generating images. A large community of GDMs has naturally …

Random features for kernel approximation: A survey on algorithms, theory, and beyond

F Liu, X Huang, Y Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …

Universality laws for high-dimensional learning with random features

H Hu, YM Lu - IEEE Transactions on Information Theory, 2022 - ieeexplore.ieee.org
We prove a universality theorem for learning with random features. Our result shows that, in
terms of training and generalization errors, a random feature model with a nonlinear …

On the Optimal Weighted Regularization in Overparameterized Linear Regression

D Wu, J Xu - Advances in Neural Information Processing …, 2020 - proceedings.neurips.cc
We consider the linear model $\vy=\vX\vbeta_ {\star}+\vepsilon $ with $\vX\in\mathbb
{R}^{n\times p} $ in the overparameterized regime $ p> n $. We estimate $\vbeta_ {\star} …

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

The neural covariance SDE: Shaped infinite depth-and-width networks at initialization

M Li, M Nica, D Roy - Advances in Neural Information …, 2022 - proceedings.neurips.cc
The logit outputs of a feedforward neural network at initialization are conditionally Gaussian,
given a random covariance matrix defined by the penultimate layer. In this work, we study …