Robustness to unbounded smoothness of generalized signsgd
Traditional analyses in non-convex optimization typically rely on the smoothness
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
assumption, namely requiring the gradients to be Lipschitz. However, recent evidence …
High-probability bounds for stochastic optimization and variational inequalities: the case of unbounded variance
During the recent years the interest of optimization and machine learning communities in
high-probability convergence of stochastic optimization methods has been growing. One of …
high-probability convergence of stochastic optimization methods has been growing. One of …
High probability convergence of stochastic gradient methods
In this work, we describe a generic approach to show convergence with high probability for
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …
both stochastic convex and non-convex optimization with sub-Gaussian noise. In previous …
Improved convergence in high probability of clipped gradient methods with heavy tailed noise
Improved Convergence in High Probability of Clipped Gradient Methods with Heavy Tailed
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …
Noise Page 1 Improved Convergence in High Probability of Clipped Gradient Methods with …
Momentum provably improves error feedback!
Due to the high communication overhead when training machine learning models in a
distributed environment, modern algorithms invariably rely on lossy communication …
distributed environment, modern algorithms invariably rely on lossy communication …
Clipped stochastic methods for variational inequalities with heavy-tailed noise
Stochastic first-order methods such as Stochastic Extragradient (SEG) or Stochastic Gradient
Descent-Ascent (SGDA) for solving smooth minimax problems and, more generally …
Descent-Ascent (SGDA) for solving smooth minimax problems and, more generally …
Methods for Convex -Smooth Optimization: Clip**, Acceleration, and Adaptivity
Due to the non-smoothness of optimization problems in Machine Learning, generalized
smoothness assumptions have been gaining a lot of attention in recent years. One of the …
smoothness assumptions have been gaining a lot of attention in recent years. One of the …
Federated learning with client subsampling, data heterogeneity, and unbounded smoothness: A new algorithm and lower bounds
We study the problem of Federated Learning (FL) under client subsampling and data
heterogeneity with an objective function that has potentially unbounded smoothness. This …
heterogeneity with an objective function that has potentially unbounded smoothness. This …
SGD with AdaGrad stepsizes: Full adaptivity with high probability to unknown parameters, unbounded gradients and affine variance
Abstract We study Stochastic Gradient Descent with AdaGrad stepsizes: a popular adaptive
(self-tuning) method for first-order stochastic optimization. Despite being well studied …
(self-tuning) method for first-order stochastic optimization. Despite being well studied …
Parameter-free regret in high probability with heavy tails
We present new algorithms for online convex optimization over unbounded domains that
obtain parameter-free regret in high-probability given access only to potentially heavy-tailed …
obtain parameter-free regret in high-probability given access only to potentially heavy-tailed …