- Academic Search

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc

We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

保存引用被引用数: 72 関連記事全 12 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

M Even, S Pesme, S Gunasekar… - Advances in Neural …, 2023 - proceedings.neurips.cc

In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …

保存引用被引用数: 14 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] arxiv.org

A modern look at the relationship between sharpness and generalization

M Andriushchenko, F Croce, M Müller, M Hein… - arxiv preprint arxiv …, 2023 - arxiv.org

Sharpness of minima is a promising quantity that can correlate with generalization in deep
networks and, when optimized during training, can improve generalization. However …

保存引用被引用数: 56 関連記事全 7 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc

Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

保存引用被引用数: 42 関連記事全 5 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

B Bordelon, C Pehlevan - Advances in Neural Information …, 2024 - proceedings.neurips.cc

We analyze the dynamics of finite width effects in wide but finite feature learning neural
networks. Starting from a dynamical mean field theory description of infinite width deep …

保存引用被引用数: 32 関連記事全 8 バージョン HTMLバージョン

[Free GPT-4]

[PDF] openreview.net

How Sharpness-Aware Minimization Minimizes Sharpness?

K Wen, T Ma, Z Li - The Eleventh International Conference on …, 2023 - openreview.net

Sharpness-Aware Minimization (SAM) is a highly effective regularization technique for
improving the generalization of deep neural networks for various settings. However, the …

保存引用被引用数: 48 関連記事 HTMLバージョン

[Free GPT-4]

[PDF] jmlr.org

The dynamics of sharpness-aware minimization: Bouncing across ravines and drifting towards wide minima

PL Bartlett, PM Long, O Bousquet - Journal of Machine Learning Research, 2023 - jmlr.org

We consider Sharpness-Aware Minimization (SAM), a gradient-based optimization method
for deep networks that has exhibited performance improvements on image and language …

保存引用被引用数: 42 関連記事全 3 バージョン HTMLバージョン

[Free GPT-4]

[PDF] neurips.cc

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

M Wang, C Ma - Advances in Neural Information Processing …, 2024 - proceedings.neurips.cc

The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …

保存引用被引用数: 14 関連記事全 7 バージョン HTMLバージョン

Implicit Bias of AdamW: -Norm Constrained Optimization

S **e, Z Li - International Conference on Machine Learning, 2024 - proceedings.mlr.press

Adam with decoupled weight decay, also known as AdamW, is widely acclaimed for its
superior performance in language modeling tasks, surpassing Adam with $\ell_2 …

保存引用被引用数: 13 関連記事キャッシュ

[Free GPT-4]

[PDF] mlr.press

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon

A Agarwala, Y Dauphin - International Conference on …, 2023 - proceedings.mlr.press

Abstract The Sharpness Aware Minimization (SAM) optimization algorithm has been shown
to control large eigenvalues of the loss Hessian and provide generalization benefits in a …

保存引用被引用数: 21 関連記事全 6 バージョン HTMLバージョン

アラートを作成

引用

検索オプション

マイライブラリに保存しました

Self-stabilization: The implicit bias of gradient descent at the edge of stability

High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability

A modern look at the relationship between sharpness and generalization

Learning threshold neurons via edge of stability

Dynamics of finite width kernel and prediction fluctuations in mean field neural networks

How Sharpness-Aware Minimization Minimizes Sharpness?

The dynamics of sharpness-aware minimization: Bouncing across ravines and drifting towards wide minima

Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks

Implicit Bias of AdamW: -Norm Constrained Optimization

SAM operates far from home: eigenvalue regularization as a dynamical phenomenon