A farewell to the bias-variance tradeoff? an overview of the theory of overparameterized machine learning

Y Dar, V Muthukumar, RG Baraniuk - arxiv preprint arxiv:2109.02355, 2021 - arxiv.org
The rapid recent progress in machine learning (ML) has raised a number of scientific
questions that challenge the longstanding dogma of the field. One of the most important …

The generalization error of random features regression: Precise asymptotics and the double descent curve

S Mei, A Montanari - Communications on Pure and Applied …, 2022 - Wiley Online Library
Deep learning methods operate in regimes that defy the traditional statistical mindset.
Neural network architectures often contain more parameters than training samples, and are …

[HTML][HTML] Surprises in high-dimensional ridgeless least squares interpolation

T Hastie, A Montanari, S Rosset, RJ Tibshirani - Annals of statistics, 2022 - ncbi.nlm.nih.gov
Interpolators—estimators that achieve zero training error—have attracted growing attention
in machine learning, mainly because state-of-the art neural networks appear to be models of …

Random features for kernel approximation: A survey on algorithms, theory, and beyond

F Liu, X Huang, Y Chen… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The class of random features is one of the most popular techniques to speed up kernel
methods in large-scale problems. Related works have been recognized by the NeurIPS Test …

A model of double descent for high-dimensional binary linear classification

Z Deng, A Kammoun… - Information and Inference …, 2022 - academic.oup.com
We consider a model for logistic regression where only a subset of features of size is used
for training a linear classifier over training samples. The classifier is obtained by running …

On the Optimal Weighted Regularization in Overparameterized Linear Regression

D Wu, J Xu - Advances in Neural Information Processing …, 2020 - proceedings.neurips.cc
We consider the linear model $\vy=\vX\vbeta_ {\star}+\vepsilon $ with $\vX\in\mathbb
{R}^{n\times p} $ in the overparameterized regime $ p> n $. We estimate $\vbeta_ {\star} …

Understanding double descent requires a fine-grained bias-variance decomposition

B Adlam, J Pennington - Advances in neural information …, 2020 - proceedings.neurips.cc
Classical learning theory suggests that the optimal generalization performance of a machine
learning model should occur at an intermediate model complexity, with simpler models …

Optimal regularization can mitigate double descent

P Nakkiran, P Venkat, S Kakade, T Ma - arxiv preprint arxiv:2003.01897, 2020 - arxiv.org
Recent empirical and theoretical studies have shown that many learning algorithms--from
linear regression to neural networks--can have test performance that is non-monotonic in …

Finite-sample analysis of interpolating linear classifiers in the overparameterized regime

NS Chatterji, PM Long - Journal of Machine Learning Research, 2021 - jmlr.org
We prove bounds on the population risk of the maximum margin algorithm for two-class
linear classification. For linearly separable training data, the maximum margin algorithm has …

Overparameterization improves robustness to covariate shift in high dimensions

N Tripuraneni, B Adlam… - Advances in Neural …, 2021 - proceedings.neurips.cc
A significant obstacle in the development of robust machine learning models is\emph
{covariate shift}, a form of distribution shift that occurs when the input distributions of the …