Sgd with large step sizes learns sparse features

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press
We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule

J Altschuler, P Parrilo - Journal of the ACM, 2023 - dl.acm.org
Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …

Provably faster gradient descent via long steps

B Grimmer - SIAM Journal on Optimization, 2024 - SIAM
This work establishes new convergence guarantees for gradient descent in smooth convex
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …

Super-acceleration with cyclical step-sizes

B Goujaud, D Scieur, A Dieuleveut… - International …, 2022 - proceedings.mlr.press
We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show
that under some assumption on the spectral gap of Hessians in machine learning, cyclical …

The curse of unrolling: Rate of differentiating through optimization

D Scieur, G Gidel, Q Bertrand… - Advances in Neural …, 2022 - proceedings.neurips.cc
Computing the Jacobian of the solution of an optimization problem is a central problem in
machine learning, with applications in hyperparameter optimization, meta-learning …

Fractal structure and generalization properties of stochastic optimization algorithms

A Camuto, G Deligiannidis… - Advances in …, 2021 - proceedings.neurips.cc
Understanding generalization in deep learning has been one of the major challenges in
statistical learning theory over the last decade. While recent work has illustrated that the …

Chaotic regularization and heavy-tailed limits for deterministic gradient descent

SH Lim, Y Wan, U Simsekli - Advances in Neural …, 2022 - proceedings.neurips.cc
Recent studies have shown that gradient descent (GD) can achieve improved generalization
when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the …

Predictive path coordination of collaborative transportation multirobot system in a smart factory

Z Nie, KC Chen - IEEE Transactions on Systems, Man, and …, 2024 - ieeexplore.ieee.org
Smart factories employ intelligent transportaton systems such as autonomous mobile robots
(AMRs) to support real-time adjusted production flows for agile and flexible production …

From stability to chaos: Analyzing gradient descent dynamics in quadratic regression

X Chen, K Balasubramanian, P Ghosal… - arxiv preprint arxiv …, 2023 - arxiv.org
We conduct a comprehensive investigation into the dynamics of gradient descent using
large-order constant step-sizes in the context of quadratic regression models. Within this …

Understanding the Generalization Benefits of Late Learning Rate Decay

Y Ren, C Ma, L Ying - International Conference on Artificial …, 2024 - proceedings.mlr.press
Why do neural networks trained with large learning rates for longer time often lead to better
generalization? In this paper, we delve into this question by examining the relation between …