Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

Prodigy: An expeditiously adaptive parameter-free learner

K Mishchenko, A Defazio - arxiv preprint arxiv:2306.06101, 2023 - arxiv.org
We consider the problem of estimating the learning rate in adaptive methods, such as
AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …

Dowg unleashed: An efficient universal parameter-free gradient descent method

A Khaled, K Mishchenko, C ** - Advances in Neural …, 2023 - proceedings.neurips.cc
This paper proposes a new easy-to-implement parameter-free gradient-based optimizer:
DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …

Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization

J Yang, X Li, N He - Advances in Neural Information …, 2022 - proceedings.neurips.cc
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …

Parameter-agnostic optimization under relaxed smoothness

F Hübler, J Yang, X Li, N He - International Conference on …, 2024 - proceedings.mlr.press
Tuning hyperparameters, such as the stepsize, presents a major challenge of training
machine learning models. To address this challenge, numerous adaptive optimization …

Locally adaptive federated learning via stochastic polyak stepsizes

S Mukherjee, N Loizou, SU Stich - arxiv preprint arxiv:2307.06306, 2023 - arxiv.org
State-of-the-art federated learning algorithms such as FedAvg require carefully tuned
stepsizes to achieve their best performance. The improvements proposed by existing …

Stochastic gradient descent with preconditioned polyak step-size

F Abdukhakimov, C **ang, D Kamzolov… - … and Mathematical Physics, 2024 - Springer
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization
methods that are widely used in solving machine learning problems. These methods display …

Momo: Momentum models for adaptive learning rates

F Schaipp, R Ohana, M Eickenberg, A Defazio… - arxiv preprint arxiv …, 2023 - arxiv.org
Training a modern machine learning architecture on a new task requires extensive learning-
rate tuning, which comes at a high computational cost. Here we develop new adaptive …

Sania: Polyak-type optimization framework leads to scale invariant stochastic algorithms

F Abdukhakimov, C **ang, D Kamzolov… - arxiv preprint arxiv …, 2023 - arxiv.org
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …

Loss Landscape Characterization of Neural Networks without Over-Parametrization

R Islamov, N Ajroldi, A Orvieto, A Lucchi - arxiv preprint arxiv:2410.12455, 2024 - arxiv.org
Optimization methods play a crucial role in modern machine learning, powering the
remarkable empirical achievements of deep learning models. These successes are even …