- Academic Search

G Vardi - Communications of the ACM, 2023 - dl.acm.org

On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …

Uložit Citovat Počet citací tohoto článku: 104 Související články Všechny verze (počet: 4)

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Understanding gradient descent on the edge of stability in deep learning

S Arora, Z Li, A Panigrahi - International Conference on …, 2022 - proceedings.mlr.press

Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …

Uložit Citovat Počet citací tohoto článku: 119 Související články Všechny verze (počet: 7) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Understanding the generalization benefit of normalization layers: Sharpness reduction

K Lyu, Z Li, S Arora - Advances in Neural Information …, 2022 - proceedings.neurips.cc

Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …

Uložit Citovat Počet citací tohoto článku: 84 Související články Všechny verze (počet: 9) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Self-stabilization: The implicit bias of gradient descent at the edge of stability

A Damian, E Nichani, JD Lee - arxiv preprint arxiv:2209.15594, 2022 - arxiv.org

Traditional analyses of gradient descent show that when the largest eigenvalue of the
Hessian, also known as the sharpness $ S (\theta) $, is bounded by $2/\eta $, training is" …

Uložit Citovat Počet citací tohoto článku: 87 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2023 - proceedings.neurips.cc

Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Uložit Citovat Počet citací tohoto článku: 45 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Implicit bias of the step size in linear diagonal neural networks

MS Nacson, K Ravichandran… - International …, 2022 - proceedings.mlr.press

Focusing on diagonal linear networks as a model for understanding the implicit bias in
underdetermined models, we show how the gradient descent step size can have a large …

Uložit Citovat Počet citací tohoto článku: 52 Související články Všechny verze (počet: 4) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] mlr.press

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

J Wu, PL Bartlett, M Telgarsky… - The Thirty Seventh …, 2024 - proceedings.mlr.press

We consider\emph {gradient descent}(GD) with a constant stepsize applied to logistic
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …

Uložit Citovat Počet citací tohoto článku: 11 Související články Všechny verze (počet: 5) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Two sides of one coin: the limits of untuned sgd and the power of adaptive methods

J Yang, X Li, I Fatkhullin, N He - Advances in Neural …, 2023 - proceedings.neurips.cc

The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying
stepsize $\eta_t=\eta/\sqrt {t} $ relies on well-tuned $\eta $ depending on problem …

Uložit Citovat Počet citací tohoto článku: 20 Související články Všechny verze (počet: 6) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Adaptive gradient methods at the edge of stability

JM Cohen, B Ghorbani, S Krishnan, N Agarwal… - arxiv preprint arxiv …, 2022 - arxiv.org

Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …

Uložit Citovat Počet citací tohoto článku: 52 Související články Všechny verze (počet: 3) Zobrazit jako HTML

[Free GPT-4]
[DeepSeek]

[PDF] neurips.cc

Implicit bias of gradient descent for logistic regression at the edge of stability

J Wu, V Braverman, JD Lee - Advances in Neural …, 2023 - proceedings.neurips.cc

Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …

Uložit Citovat Počet citací tohoto článku: 19 Související články Všechny verze (počet: 7) Zobrazit jako HTML

Vytvořit upozornění

Citovat

Rozšířené vyhledávání

Uloženo do Mojí knihovny

Understanding the unstable convergence of gradient descent

On the implicit bias in deep-learning algorithms

Understanding gradient descent on the edge of stability in deep learning

Understanding the generalization benefit of normalization layers: Sharpness reduction

Self-stabilization: The implicit bias of gradient descent at the edge of stability

Learning threshold neurons via edge of stability

Implicit bias of the step size in linear diagonal neural networks

Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency

Two sides of one coin: the limits of untuned sgd and the power of adaptive methods

Adaptive gradient methods at the edge of stability

Implicit bias of gradient descent for logistic regression at the edge of stability