Sgd with large step sizes learns sparse features
We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …
in the training of neural networks. We present empirical observations that commonly used …
Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule
Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …
Provably faster gradient descent via long steps
B Grimmer - SIAM Journal on Optimization, 2024 - SIAM
This work establishes new convergence guarantees for gradient descent in smooth convex
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …
Super-acceleration with cyclical step-sizes
We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show
that under some assumption on the spectral gap of Hessians in machine learning, cyclical …
that under some assumption on the spectral gap of Hessians in machine learning, cyclical …
The curse of unrolling: Rate of differentiating through optimization
Computing the Jacobian of the solution of an optimization problem is a central problem in
machine learning, with applications in hyperparameter optimization, meta-learning …
machine learning, with applications in hyperparameter optimization, meta-learning …
Fractal structure and generalization properties of stochastic optimization algorithms
Understanding generalization in deep learning has been one of the major challenges in
statistical learning theory over the last decade. While recent work has illustrated that the …
statistical learning theory over the last decade. While recent work has illustrated that the …
Chaotic regularization and heavy-tailed limits for deterministic gradient descent
Recent studies have shown that gradient descent (GD) can achieve improved generalization
when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the …
when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the …
Predictive path coordination of collaborative transportation multirobot system in a smart factory
Smart factories employ intelligent transportaton systems such as autonomous mobile robots
(AMRs) to support real-time adjusted production flows for agile and flexible production …
(AMRs) to support real-time adjusted production flows for agile and flexible production …
From stability to chaos: Analyzing gradient descent dynamics in quadratic regression
We conduct a comprehensive investigation into the dynamics of gradient descent using
large-order constant step-sizes in the context of quadratic regression models. Within this …
large-order constant step-sizes in the context of quadratic regression models. Within this …
Understanding the Generalization Benefits of Late Learning Rate Decay
Why do neural networks trained with large learning rates for longer time often lead to better
generalization? In this paper, we delve into this question by examining the relation between …
generalization? In this paper, we delve into this question by examining the relation between …