Benign overfitting in two-layer convolutional neural networks

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc
Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

Benign overfitting in two-layer ReLU convolutional neural networks

Y Kou, Z Chen, Y Chen, Q Gu - International Conference on …, 2023 - proceedings.mlr.press
Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …

Implicit bias of gradient descent for two-layer relu and leaky relu networks on nearly-orthogonal data

Y Kou, Z Chen, Q Gu - Advances in Neural Information …, 2023 - proceedings.neurips.cc
The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …

Why does sharpness-aware minimization generalize better than SGD?

Z Chen, J Zhang, Y Kou, X Chen… - Advances in neural …, 2024 - proceedings.neurips.cc
The challenge of overfitting, in which the model memorizes the training data and fails to
generalize to test data, has become increasingly significant in the training of large neural …

Limitations of the ntk for understanding generalization in deep learning

N Vyas, Y Bansal, P Nakkiran - arxiv preprint arxiv:2206.10012, 2022 - arxiv.org
The``Neural Tangent Kernel''(NTK)(Jacot et al 2018), and its empirical variants have been
proposed as a proxy to capture certain behaviors of real neural networks. In this work, we …

Understanding benign overfitting in gradient-based meta learning

L Chen, S Lu, T Chen - Advances in neural information …, 2022 - proceedings.neurips.cc
Meta learning has demonstrated tremendous success in few-shot learning with limited
supervised data. In those settings, the meta model is usually overparameterized. While the …

Generalization in kernel regression under realistic assumptions

D Barzilai, O Shamir - arxiv preprint arxiv:2312.15995, 2023 - arxiv.org
It is by now well-established that modern over-parameterized models seem to elude the bias-
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …

Deep linear networks can benignly overfit when shallow ones do

NS Chatterji, PM Long - Journal of Machine Learning Research, 2023 - jmlr.org
We bound the excess risk of interpolating deep linear networks trained using gradient flow.
In a setting previously used to establish risk bounds for the minimum ℓ2-norm interpolant, we …

Phase transition from clean training to adversarial training

Y **ng, Q Song, G Cheng - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Adversarial training is one important algorithm to achieve robust machine learning models.
However, numerous empirical results show a great performance degradation from clean …

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization

J Jiang, W Huang, M Zhang, T Suzuki, L Nie - arxiv preprint arxiv …, 2024 - arxiv.org
Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …