When do flat minima optimizers work?
Recently, flat-minima optimizers, which seek to find parameters in low-loss neighborhoods,
have been shown to improve a neural network's generalization performance over stochastic …
have been shown to improve a neural network's generalization performance over stochastic …
Temperature balancing, layer-wise weight analysis, and neural network training
Regularization in modern machine learning is crucial, and it can take various forms in
algorithmic design: training set, model family, error function, regularization terms, and …
algorithmic design: training set, model family, error function, regularization terms, and …
Test accuracy vs. generalization gap: Model selection in nlp without accessing training or testing data
Selecting suitable architecture parameters and training hyperparameters is essential for
enhancing machine learning (ML) model performance. Several recent empirical studies …
enhancing machine learning (ML) model performance. Several recent empirical studies …
When are ensembles really effective?
Ensembling has a long history in statistical data analysis, with many impactful applications.
However, in many modern machine learning settings, the benefits of ensembling are less …
However, in many modern machine learning settings, the benefits of ensembling are less …
Understanding robust learning through the lens of representation similarities
Abstract Representation learning,\textit {ie} the generation of representations useful for
downstream applications, is a task of fundamental importance that underlies much of the …
downstream applications, is a task of fundamental importance that underlies much of the …
Minimum norm interpolation by perceptra: Explicit regularization and implicit bias
J Park, I Pelakh, S Wojtowytsch - Advances in Neural …, 2023 - proceedings.neurips.cc
We investigate how shallow ReLU networks interpolate between known regions. Our
analysis shows that empirical risk minimizers converge to a minimum norm interpolant as …
analysis shows that empirical risk minimizers converge to a minimum norm interpolant as …
Evaluating natural language processing models with generalization metrics that do not need access to any training or testing data
Y Yang, R Theisen, L Hodgkinson, JE Gonzalez… - ar** Lens: A Simple Model Provides Empirical Insights On Grokking, Gradient Boosting & Beyond
Deep learning sometimes appears to work in unexpected ways. In pursuit of a deeper
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …
understanding of its surprising behaviors, we investigate the utility of a simple yet accurate …