Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
On the implicit bias in deep-learning algorithms
G Vardi - Communications of the ACM, 2023 - dl.acm.org
On the Implicit Bias in Deep-Learning Algorithms Page 1 DEEP LEARNING HAS been highly
successful in recent years and has led to dramatic improvements in multiple domains …
successful in recent years and has led to dramatic improvements in multiple domains …
Understanding gradient descent on the edge of stability in deep learning
Deep learning experiments by\citet {cohen2021gradient} using deterministic Gradient
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Descent (GD) revealed an Edge of Stability (EoS) phase when learning rate (LR) and …
Understanding the generalization benefit of normalization layers: Sharpness reduction
Abstract Normalization layers (eg, Batch Normalization, Layer Normalization) were
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
introduced to help with optimization difficulties in very deep nets, but they clearly also help …
Self-stabilization: The implicit bias of gradient descent at the edge of stability
Traditional analyses of gradient descent show that when the largest eigenvalue of the
Hessian, also known as the sharpness $ S (\theta) $, is bounded by $2/\eta $, training is" …
Hessian, also known as the sharpness $ S (\theta) $, is bounded by $2/\eta $, training is" …
Learning threshold neurons via edge of stability
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
Implicit bias of the step size in linear diagonal neural networks
MS Nacson, K Ravichandran… - International …, 2022 - proceedings.mlr.press
Focusing on diagonal linear networks as a model for understanding the implicit bias in
underdetermined models, we show how the gradient descent step size can have a large …
underdetermined models, we show how the gradient descent step size can have a large …
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
We consider\emph {gradient descent}(GD) with a constant stepsize applied to logistic
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …
Two sides of one coin: the limits of untuned sgd and the power of adaptive methods
The classical analysis of Stochastic Gradient Descent (SGD) with polynomially decaying
stepsize $\eta_t=\eta/\sqrt {t} $ relies on well-tuned $\eta $ depending on problem …
stepsize $\eta_t=\eta/\sqrt {t} $ relies on well-tuned $\eta $ depending on problem …
Adaptive gradient methods at the edge of stability
Very little is known about the training dynamics of adaptive gradient methods like Adam in
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …
deep learning. In this paper, we shed light on the behavior of these algorithms in the full …
Implicit bias of gradient descent for logistic regression at the edge of stability
Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …