High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

J Wu, PL Bartlett, M Telgarsky… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We consider\emph {gradient descent}(GD) with a constant stepsize applied to logistic
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …

Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization

Y Cai, J Wu, S Mei, M Lindsey… - Advances in Neural …, 2025 - proceedings.neurips.cc
The typical training of neural networks using large stepsize gradient descent (GD) under the
logistic loss often involves two distinct phases, where the empirical risk oscillates in the first …

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

Bifurcations and loss jumps in RNN training

L Eisenmann, Z Monfared, N Göring… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recurrent neural networks (RNNs) are popular machine learning tools for modeling and
forecasting sequential data and for inferring dynamical systems (DS) from observed time …

(S) GD over diagonal linear networks: implicit regularisation, large stepsizes and edge of stability

M Even, S Pesme, S Gunasekar… - arxiv preprint arxiv …, 2023 - arxiv.org
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

Implicit bias of gradient descent for logistic regression at the edge of stability

J Wu, V Braverman, JD Lee - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

M Wang, C Ma - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc
The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …

Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond

I Kreisler, MS Nacson, D Soudry… - … on Machine Learning, 2023 - proceedings.mlr.press
Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …