Google Academic

G Ben Arous, R Gheissari… - Advances in neural …, 2022 - proceedings.neurips.cc

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Salvați Citați Citat de 81 ori Articole cu conținut similar Toate cele 12 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc

Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Salvați Citați Citat de 46 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

J Wu, PL Bartlett, M Telgarsky… - The Thirty Seventh …, 2024 - proceedings.mlr.press

We consider\emph {gradient descent}(GD) with a constant stepsize applied to logistic
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …

Salvați Citați Citat de 12 ori Articole cu conținut similar Toate cele 5 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization

Y Cai, J Wu, S Mei, M Lindsey… - Advances in Neural …, 2025 - proceedings.neurips.cc

The typical training of neural networks using large stepsize gradient descent (GD) under the
logistic loss often involves two distinct phases, where the empirical risk oscillates in the first …

Salvați Citați Citat de 5 ori Articole cu conținut similar Toate cele 4 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

Salvați Citați Citat de 15 ori Articole cu conținut similar Toate cele 3 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Bifurcations and loss jumps in RNN training

L Eisenmann, Z Monfared, N Göring… - Advances in Neural …, 2023 - proceedings.neurips.cc

Recurrent neural networks (RNNs) are popular machine learning tools for modeling and
forecasting sequential data and for inferring dynamical systems (DS) from observed time …

Salvați Citați Citat de 14 ori Articole cu conținut similar Toate cele 6 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

(S) GD over diagonal linear networks: implicit regularisation, large stepsizes and edge of stability

M Even, S Pesme, S Gunasekar… - arxiv preprint arxiv …, 2023 - arxiv.org

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …

Salvați Citați Citat de 33 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Implicit bias of gradient descent for logistic regression at the edge of stability

J Wu, V Braverman, JD Lee - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …

Salvați Citați Citat de 22 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

M Wang, C Ma - Advances in Neural Information Processing …, 2023 - proceedings.neurips.cc

The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …

Salvați Citați Citat de 19 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond

I Kreisler, MS Nacson, D Soudry… - … on Machine Learning, 2023 - proceedings.mlr.press

Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …

Salvați Citați Citat de 16 ori Articole cu conținut similar Toate cele 7 versiuni Afișare ca HTML

Creează alerta

Citați

Căutare avansată

Salvat în Bibliotecă

Understanding edge-of-stability training dynamics with a minimalist example

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

Learning threshold neurons via edge of stability

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

Bifurcations and loss jumps in RNN training

(S) GD over diagonal linear networks: implicit regularisation, large stepsizes and edge of stability

Implicit bias of gradient descent for logistic regression at the edge of stability

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond