Convergence of adam under relaxed assumptions
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …
Faster non-convex federated learning via global and local momentum
Abstract We propose\texttt {FedGLOMO}, a novel federated learning (FL) algorithm with an
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …
Communication compression for byzantine robust learning: New efficient algorithms and improved rates
Byzantine robustness is an essential feature of algorithms for certain distributed optimization
problems, typically encountered in collaborative/federated learning. These problems are …
problems, typically encountered in collaborative/federated learning. These problems are …
DASHA: Distributed nonconvex optimization with communication compression, optimal oracle complexity, and no client synchronization
We develop and analyze DASHA: a new family of methods for nonconvex distributed
optimization problems. When the local functions at the nodes have a finite-sum or an …
optimization problems. When the local functions at the nodes have a finite-sum or an …
Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness
In many important machine learning applications, the standard assumption of having a
globally Lipschitz continuous gradient may fail to hold. This paper delves into a more …
globally Lipschitz continuous gradient may fail to hold. This paper delves into a more …
A stochastic proximal gradient framework for decentralized non-convex composite optimization: Topology-independent sample complexity and communication …
Decentralized optimization is a promising parallel computation paradigm for large-scale
data analytics and machine learning problems defined over a network of nodes. This paper …
data analytics and machine learning problems defined over a network of nodes. This paper …
Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise
In this paper, we consider the stochastic optimization problem with smooth but not
necessarily convex objectives in the heavy-tailed noise regime, where the stochastic …
necessarily convex objectives in the heavy-tailed noise regime, where the stochastic …
Variance reduced distributed non-convex optimization using matrix stepsizes
Matrix-stepsized gradient descent algorithms have been shown to have superior
performance in non-convex optimization problems compared to their scalar counterparts …
performance in non-convex optimization problems compared to their scalar counterparts …
Random-reshuffled SARAH does not need full gradient computations
Abstract The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance
reduced variant of the Stochastic Gradient Descent algorithm that needs a gradient of the …
reduced variant of the Stochastic Gradient Descent algorithm that needs a gradient of the …
A Unified Model for Large-Scale Inexact Fixed-Point Iteration: A Stochastic Optimization Perspective
A Hashemi - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
Calculating fixed points of a nonlinear function is a central problem in many areas of science
and engineering with applications ranging from the study of dynamical systems to …
and engineering with applications ranging from the study of dynamical systems to …