High dimensional analysis reveals conservative sharpening and a stochastic edge of stability

A Agarwala, J Pennington - arxiv preprint arxiv:2404.19261, 2024‏ - arxiv.org
Recent empirical and theoretical work has shown that the dynamics of the large eigenvalues
of the training loss Hessian have some remarkably robust features across models and …

Guiding Two-Layer Neural Network Lipschitzness via Gradient Descent Learning Rate Constraints

K Sung, A Kratsios, N Forman - arxiv preprint arxiv:2502.03792, 2025‏ - arxiv.org
We demonstrate that applying an eventual decay to the learning rate (LR) in empirical risk
minimization (ERM), where the mean-squared-error loss is minimized using standard …