- Academic Search

Y Cao, Z Chen, M Belkin, Q Gu - Advances in neural …, 2022 - proceedings.neurips.cc

Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …

Save Cite Cited by 126 Related articles All 8 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Benign overfitting in two-layer ReLU convolutional neural networks

Y Kou, Z Chen, Y Chen, Q Gu - International Conference on …, 2023 - proceedings.mlr.press

Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …

Save Cite Cited by 42 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Implicit bias of gradient descent for two-layer relu and leaky relu networks on nearly-orthogonal data

Y Kou, Z Chen, Q Gu - Advances in Neural Information …, 2023 - proceedings.neurips.cc

The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …

Save Cite Cited by 13 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Why does sharpness-aware minimization generalize better than SGD?

Z Chen, J Zhang, Y Kou, X Chen… - Advances in neural …, 2024 - proceedings.neurips.cc

The challenge of overfitting, in which the model memorizes the training data and fails to
generalize to test data, has become increasingly significant in the training of large neural …

Save Cite Cited by 16 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Limitations of the ntk for understanding generalization in deep learning

N Vyas, Y Bansal, P Nakkiran - arxiv preprint arxiv:2206.10012, 2022 - arxiv.org

The``Neural Tangent Kernel''(NTK)(Jacot et al 2018), and its empirical variants have been
proposed as a proxy to capture certain behaviors of real neural networks. In this work, we …

Save Cite Cited by 29 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding benign overfitting in gradient-based meta learning

L Chen, S Lu, T Chen - Advances in neural information …, 2022 - proceedings.neurips.cc

Meta learning has demonstrated tremendous success in few-shot learning with limited
supervised data. In those settings, the meta model is usually overparameterized. While the …

Save Cite Cited by 18 Related articles All 10 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Generalization in kernel regression under realistic assumptions

D Barzilai, O Shamir - arxiv preprint arxiv:2312.15995, 2023 - arxiv.org

It is by now well-established that modern over-parameterized models seem to elude the bias-
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …

Save Cite Cited by 19 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] jmlr.org

Deep linear networks can benignly overfit when shallow ones do

NS Chatterji, PM Long - Journal of Machine Learning Research, 2023 - jmlr.org

We bound the excess risk of interpolating deep linear networks trained using gradient flow.
In a setting previously used to establish risk bounds for the minimum ℓ2-norm interpolant, we …

Save Cite Cited by 11 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Phase transition from clean training to adversarial training

Y **ng, Q Song, G Cheng - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Adversarial training is one important algorithm to achieve robust machine learning models.
However, numerous empirical results show a great performance degradation from clean …

Save Cite Cited by 8 Related articles All 7 versions Free GPT-4 DeepSeek View as HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization

J Jiang, W Huang, M Zhang, T Suzuki, L Nie - arxiv preprint arxiv …, 2024 - arxiv.org

Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …

Save Cite Cited by 4 Related articles All 3 versions Free GPT-4 DeepSeek View as HTML

Create alert

Cite

Advanced search

Saved to My library

Towards an understanding of benign overfitting in neural networks

Benign overfitting in two-layer convolutional neural networks

Benign overfitting in two-layer ReLU convolutional neural networks

Implicit bias of gradient descent for two-layer relu and leaky relu networks on nearly-orthogonal data

Why does sharpness-aware minimization generalize better than SGD?

Limitations of the ntk for understanding generalization in deep learning

Understanding benign overfitting in gradient-based meta learning

Generalization in kernel regression under realistic assumptions

Deep linear networks can benignly overfit when shallow ones do

Phase transition from clean training to adversarial training

Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization