Model collapse demystified: The case of regression

E Dohmatob, Y Feng, J Kempe - Advances in Neural …, 2025 - proceedings.neurips.cc
The era of proliferation of large language and image generation models begs the question
of what happens if models are trained on the synthesized outputs of other models. The …

Generalization error rates in kernel regression: The crossover from the noiseless to noisy regime

H Cui, B Loureiro, F Krzakala… - Advances in Neural …, 2021 - proceedings.neurips.cc
In this manuscript we consider Kernel Ridge Regression (KRR) under the Gaussian design.
Exponents for the decay of the excess generalization error of KRR have been reported in …

Benign overfitting of constant-stepsize sgd for linear regression

D Zou, J Wu, V Braverman, Q Gu… - … on Learning Theory, 2021 - proceedings.mlr.press
There is an increasing realization that algorithmic inductive biases are central in preventing
overfitting; empirically, we often see a benign overfitting phenomenon in overparameterized …

Near-interpolators: Rapid norm growth and the trade-off between interpolation and generalization

Y Wang, R Sonthalia, W Hu - International Conference on …, 2024 - proceedings.mlr.press
We study the generalization capability of nearly-interpolating linear regressors: ${\beta} $'s
whose training error $\tau $ is positive but small, ie, below the noise floor. Under a random …

Scaling laws in linear regression: Compute, parameters, and data

L Lin, J Wu, SM Kakade, PL Bartlett, JD Lee - arxiv preprint arxiv …, 2024 - arxiv.org
Empirically, large-scale deep learning models often satisfy a neural scaling law: the test
error of the trained model improves polynomially as the model size and data size grow …

The high line: Exact risk and learning rate curves of stochastic adaptive learning rate algorithms

E Collins-Woodfin, I Seroussi… - Advances in …, 2025 - proceedings.neurips.cc
We develop a framework for analyzing the training and learning rate dynamics on a large
class of high-dimensional optimization problems, which we call the high line, trained using …

Last iterate convergence of SGD for Least-Squares in the Interpolation regime.

AV Varre, L Pillaud-Vivien… - Advances in Neural …, 2021 - proceedings.neurips.cc
Motivated by the recent successes of neural networks that have the ability to fit the data
perfectly\emph {and} generalize well, we study the noiseless model in the fundamental least …

Capacity dependent analysis for functional online learning algorithms

X Guo, ZC Guo, L Shi - Applied and Computational Harmonic Analysis, 2023 - Elsevier
This article provides convergence analysis of online stochastic gradient descent algorithms
for functional linear models. Adopting the characterizations of the slope function regularity …

Last iterate risk bounds of sgd with decaying stepsize for overparameterized linear regression

J Wu, D Zou, V Braverman, Q Gu… - … on Machine Learning, 2022 - proceedings.mlr.press
Stochastic gradient descent (SGD) has been shown to generalize well in many deep
learning applications. In practice, one often runs SGD with a geometrically decaying …

Statistical optimality of divide and conquer kernel-based functional linear regression

J Liu, L Shi - Journal of Machine Learning Research, 2024 - jmlr.org
Previous analysis of regularized functional linear regression in a reproducing kernel Hilbert
space (RKHS) typically requires the target function to be contained in this kernel space. This …