- Academic Search

M Andriushchenko, AV Varre… - International …, 2023 - proceedings.mlr.press

We showcase important features of the dynamics of the Stochastic Gradient Descent (SGD)
in the training of neural networks. We present empirical observations that commonly used …

Save Cite Cited by 61 Related articles All 7 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] acm.org

Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule

J Altschuler, P Parrilo - Journal of the ACM, 2023 - dl.acm.org

Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …

Save Cite Cited by 32 Related articles All 2 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Provably faster gradient descent via long steps

B Grimmer - SIAM Journal on Optimization, 2024 - SIAM

This work establishes new convergence guarantees for gradient descent in smooth convex
optimization via a computer-assisted analysis technique. Our theory allows nonconstant …

Save Cite Cited by 36 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] mlr.press

Super-acceleration with cyclical step-sizes

B Goujaud, D Scieur, A Dieuleveut… - International …, 2022 - proceedings.mlr.press

We develop a convergence-rate analysis of momentum with cyclical step-sizes. We show
that under some assumption on the spectral gap of Hessians in machine learning, cyclical …

Save Cite Cited by 37 Related articles All 9 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

The curse of unrolling: Rate of differentiating through optimization

D Scieur, G Gidel, Q Bertrand… - Advances in Neural …, 2022 - proceedings.neurips.cc

Computing the Jacobian of the solution of an optimization problem is a central problem in
machine learning, with applications in hyperparameter optimization, meta-learning …

Save Cite Cited by 11 Related articles All 5 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Fractal structure and generalization properties of stochastic optimization algorithms

A Camuto, G Deligiannidis… - Advances in …, 2021 - proceedings.neurips.cc

Understanding generalization in deep learning has been one of the major challenges in
statistical learning theory over the last decade. While recent work has illustrated that the …

Save Cite Cited by 29 Related articles All 14 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] neurips.cc

Chaotic regularization and heavy-tailed limits for deterministic gradient descent

SH Lim, Y Wan, U Simsekli - Advances in Neural …, 2022 - proceedings.neurips.cc

Recent studies have shown that gradient descent (GD) can achieve improved generalization
when its dynamics exhibits a chaotic behavior. However, to obtain the desired effect, the …

Save Cite Cited by 14 Related articles All 9 versions Free GPT-4 View as HTML

Predictive path coordination of collaborative transportation multirobot system in a smart factory

Z Nie, KC Chen - IEEE Transactions on Systems, Man, and …, 2024 - ieeexplore.ieee.org

Smart factories employ intelligent transportaton systems such as autonomous mobile robots
(AMRs) to support real-time adjusted production flows for agile and flexible production …

Save Cite Cited by 2 Related articles

[Free GPT-4]

[PDF] arxiv.org

From stability to chaos: Analyzing gradient descent dynamics in quadratic regression

X Chen, K Balasubramanian, P Ghosal… - arxiv preprint arxiv …, 2023 - arxiv.org

We conduct a comprehensive investigation into the dynamics of gradient descent using
large-order constant step-sizes in the context of quadratic regression models. Within this …

Save Cite Cited by 5 Related articles All 4 versions Free GPT-4 View as HTML

[Free GPT-4]

[PDF] mlr.press

Understanding the Generalization Benefits of Late Learning Rate Decay

Y Ren, C Ma, L Ying - International Conference on Artificial …, 2024 - proceedings.mlr.press

Why do neural networks trained with large learning rates for longer time often lead to better
generalization? In this paper, we delve into this question by examining the relation between …

Save Cite Cited by 3 Related articles All 3 versions Free GPT-4 View as HTML

Create alert

Cite

Advanced search

Saved to My library

Acceleration via fractal learning rate schedules

Sgd with large step sizes learns sparse features

Acceleration by Stepsize Hedging: Multi-Step Descent and the Silver Stepsize Schedule

Provably faster gradient descent via long steps

Super-acceleration with cyclical step-sizes

The curse of unrolling: Rate of differentiating through optimization

Fractal structure and generalization properties of stochastic optimization algorithms

Chaotic regularization and heavy-tailed limits for deterministic gradient descent

Predictive path coordination of collaborative transportation multirobot system in a smart factory

From stability to chaos: Analyzing gradient descent dynamics in quadratic regression

Understanding the Generalization Benefits of Late Learning Rate Decay