High dimensional analysis reveals conservative sharpening and a stochastic edge of stability
Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues
of the training loss Hessian have some remarkably robust features across models and …
of the training loss Hessian have some remarkably robust features across models and …
Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints
We demonstrate that applying an eventual decay to the learning rate (LR) in empirical risk
minimization (ERM), where the mean-squared-error loss is minimized using standard …
minimization (ERM), where the mean-squared-error loss is minimized using standard …