Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
High-dimensional limit theorems for sgd: Effective dynamics and critical scaling
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
the high-dimensional regime. We prove limit theorems for the trajectories of summary …
Learning threshold neurons via edge of stability
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
Large Stepsize Gradient Descent for Logistic Loss: Non-Monotonicity of the Loss Improves Optimization Efficiency
We consider\emph {gradient descent}(GD) with a constant stepsize applied to logistic
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …
regression with linearly separable data, where the constant stepsize $\eta $ is so large that …
Large stepsize gradient descent for non-homogeneous two-layer networks: Margin improvement and fast optimization
The typical training of neural networks using large stepsize gradient descent (GD) under the
logistic loss often involves two distinct phases, where the empirical risk oscillates in the first …
logistic loss often involves two distinct phases, where the empirical risk oscillates in the first …
(S) GD over Diagonal Linear Networks: Implicit bias, Large Stepsizes and Edge of Stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over $2 …
Bifurcations and loss jumps in RNN training
Recurrent neural networks (RNNs) are popular machine learning tools for modeling and
forecasting sequential data and for inferring dynamical systems (DS) from observed time …
forecasting sequential data and for inferring dynamical systems (DS) from observed time …
(S) GD over diagonal linear networks: implicit regularisation, large stepsizes and edge of stability
In this paper, we investigate the impact of stochasticity and large stepsizes on the implicit
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
regularisation of gradient descent (GD) and stochastic gradient descent (SGD) over …
Implicit bias of gradient descent for logistic regression at the edge of stability
Recent research has observed that in machine learning optimization, gradient descent (GD)
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …
often operates at the edge of stability (EoS)[Cohen et al., 2021], where the stepsizes are set …
Understanding multi-phase optimization dynamics and rich nonlinear behaviors of relu networks
The training process of ReLU neural networks often exhibits complicated nonlinear
phenomena. The nonlinearity of models and non-convexity of loss pose significant …
phenomena. The nonlinearity of models and non-convexity of loss pose significant …
Gradient descent monotonically decreases the sharpness of gradient flow solutions in scalar networks and beyond
Recent research shows that when Gradient Descent (GD) is applied to neural networks, the
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …
loss almost never decreases monotonically. Instead, the loss oscillates as gradient descent …