Benign overfitting in two-layer convolutional neural networks
Modern neural networks often have great expressive power and can be trained to overfit the
training data, while still achieving a good test performance. This phenomenon is referred to …
training data, while still achieving a good test performance. This phenomenon is referred to …
Benign overfitting in two-layer ReLU convolutional neural networks
Modern deep learning models with great expressive power can be trained to overfit the
training data but still generalize well. This phenomenon is referred to as benign overfitting …
training data but still generalize well. This phenomenon is referred to as benign overfitting …
Implicit bias of gradient descent for two-layer relu and leaky relu networks on nearly-orthogonal data
The implicit bias towards solutions with favorable properties is believed to be a key reason
why neural networks trained by gradient-based optimization can generalize well. While the …
why neural networks trained by gradient-based optimization can generalize well. While the …
Why does sharpness-aware minimization generalize better than SGD?
The challenge of overfitting, in which the model memorizes the training data and fails to
generalize to test data, has become increasingly significant in the training of large neural …
generalize to test data, has become increasingly significant in the training of large neural …
Limitations of the ntk for understanding generalization in deep learning
The``Neural Tangent Kernel''(NTK)(Jacot et al 2018), and its empirical variants have been
proposed as a proxy to capture certain behaviors of real neural networks. In this work, we …
proposed as a proxy to capture certain behaviors of real neural networks. In this work, we …
Understanding benign overfitting in gradient-based meta learning
Meta learning has demonstrated tremendous success in few-shot learning with limited
supervised data. In those settings, the meta model is usually overparameterized. While the …
supervised data. In those settings, the meta model is usually overparameterized. While the …
Generalization in kernel regression under realistic assumptions
It is by now well-established that modern over-parameterized models seem to elude the bias-
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …
variance tradeoff and generalize well despite overfitting noise. Many recent works attempt to …
Deep linear networks can benignly overfit when shallow ones do
NS Chatterji, PM Long - Journal of Machine Learning Research, 2023 - jmlr.org
We bound the excess risk of interpolating deep linear networks trained using gradient flow.
In a setting previously used to establish risk bounds for the minimum ℓ2-norm interpolant, we …
In a setting previously used to establish risk bounds for the minimum ℓ2-norm interpolant, we …
Phase transition from clean training to adversarial training
Adversarial training is one important algorithm to achieve robust machine learning models.
However, numerous empirical results show a great performance degradation from clean …
However, numerous empirical results show a great performance degradation from clean …
Unveil benign overfitting for transformer in vision: Training dynamics, convergence, and generalization
Transformers have demonstrated great power in the recent development of large
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …
foundational models. In particular, the Vision Transformer (ViT) has brought revolutionary …