An sde for modeling sam: Theory and insights

EM Compagnoni, L Biggio, A Orvieto… - International …, 2023 - proceedings.mlr.press
We study the SAM (Sharpness-Aware Minimization) optimizer which has recently attracted a
lot of interest due to its increased performance over more classical variants of stochastic …

Adaptive Methods through the Lens of SDEs: Theoretical Insights on the Role of Noise

EM Compagnoni, T Liu, R Islamov, FN Proske… - arxiv preprint arxiv …, 2024 - arxiv.org
Despite the vast empirical evidence supporting the efficacy of adaptive optimization methods
in deep learning, their theoretical understanding is far from complete. This work introduces …

SDEs for Minimax Optimization

EM Compagnoni, A Orvieto, H Kersting… - International …, 2024 - proceedings.mlr.press
Minimax optimization problems have attracted a lot of attention over the past few years, with
applications ranging from economics to machine learning. While advanced optimization …

Stochastic Modified Flows for Riemannian Stochastic Gradient Descent

B Gess, S Kassing, N Rana - SIAM Journal on Control and Optimization, 2024 - SIAM
We give quantitative estimates for the rate of convergence of Riemannian stochastic
gradient descent (RSGD) to Riemannian gradient flow and to a diffusion process, the so …

Unlocking optimal batch size schedules using continuous-time control and perturbation theory

S Perko - arxiv preprint arxiv:2312.01898, 2023 - arxiv.org
Stochastic Gradient Descent (SGD) and its variants are almost universally used to train
neural networks and to fit a variety of other parametric models. An important hyperparameter …

[PDF][PDF] Dynamics of Adaptive Momentum Optimizers on Challenging Deep Learning Landscapes

A Orvieto - 2023 - research-collection.ethz.ch
Deep learning technologies are skyrocketing in popularity across a wide range of domains,
with groundbreaking accomplishments in fields such as natural language processing …