Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …
SGD have shown remarkable effectiveness when training over-parameterized models …
Prodigy: An expeditiously adaptive parameter-free learner
We consider the problem of estimating the learning rate in adaptive methods, such as
AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …
AdaGrad and Adam. We propose Prodigy, an algorithm that provably estimates the distance …
Dowg unleashed: An efficient universal parameter-free gradient descent method
This paper proposes a new easy-to-implement parameter-free gradient-based optimizer:
DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …
DoWG (Distance over Weighted Gradients). We prove that DoWG is efficient---matching the …
Nest your adaptive algorithm for parameter-agnostic nonconvex minimax optimization
Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …
owing to their parameter-agnostic ability–requiring no a priori knowledge about problem …
Parameter-agnostic optimization under relaxed smoothness
Tuning hyperparameters, such as the stepsize, presents a major challenge of training
machine learning models. To address this challenge, numerous adaptive optimization …
machine learning models. To address this challenge, numerous adaptive optimization …
Locally adaptive federated learning via stochastic polyak stepsizes
State-of-the-art federated learning algorithms such as FedAvg require carefully tuned
stepsizes to achieve their best performance. The improvements proposed by existing …
stepsizes to achieve their best performance. The improvements proposed by existing …
Stochastic gradient descent with preconditioned polyak step-size
Abstract Stochastic Gradient Descent (SGD) is one of the many iterative optimization
methods that are widely used in solving machine learning problems. These methods display …
methods that are widely used in solving machine learning problems. These methods display …
Momo: Momentum models for adaptive learning rates
Training a modern machine learning architecture on a new task requires extensive learning-
rate tuning, which comes at a high computational cost. Here we develop new adaptive …
rate tuning, which comes at a high computational cost. Here we develop new adaptive …
Sania: Polyak-type optimization framework leads to scale invariant stochastic algorithms
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …
Loss Landscape Characterization of Neural Networks without Over-Parametrization
Optimization methods play a crucial role in modern machine learning, powering the
remarkable empirical achievements of deep learning models. These successes are even …
remarkable empirical achievements of deep learning models. These successes are even …