Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

Faster non-convex federated learning via global and local momentum

R Das, A Acharya, A Hashemi… - Uncertainty in …, 2022 - proceedings.mlr.press
Abstract We propose\texttt {FedGLOMO}, a novel federated learning (FL) algorithm with an
iteration complexity of $\mathcal {O}(\epsilon^{-1.5}) $ to converge to an $\epsilon …

Communication compression for byzantine robust learning: New efficient algorithms and improved rates

A Rammal, K Gruntkowska, N Fedin… - International …, 2024 - proceedings.mlr.press
Byzantine robustness is an essential feature of algorithms for certain distributed optimization
problems, typically encountered in collaborative/federated learning. These problems are …

DASHA: Distributed nonconvex optimization with communication compression, optimal oracle complexity, and no client synchronization

A Tyurin, P Richtárik - arxiv preprint arxiv:2202.01268, 2022 - arxiv.org
We develop and analyze DASHA: a new family of methods for nonconvex distributed
optimization problems. When the local functions at the nodes have a finite-sum or an …

Trust Region Methods for Nonconvex Stochastic Optimization beyond Lipschitz Smoothness

C **e, C Li, C Zhang, Q Deng, D Ge, Y Ye - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In many important machine learning applications, the standard assumption of having a
globally Lipschitz continuous gradient may fail to hold. This paper delves into a more …

A stochastic proximal gradient framework for decentralized non-convex composite optimization: Topology-independent sample complexity and communication …

R **n, S Das, UA Khan, S Kar - arxiv preprint arxiv:2110.01594, 2021 - arxiv.org
Decentralized optimization is a promising parallel computation paradigm for large-scale
data analytics and machine learning problems defined over a network of nodes. This paper …

Breaking the lower bound with (little) structure: Acceleration in non-convex stochastic optimization with heavy-tailed noise

Z Liu, J Zhang, Z Zhou - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
In this paper, we consider the stochastic optimization problem with smooth but not
necessarily convex objectives in the heavy-tailed noise regime, where the stochastic …

Variance reduced distributed non-convex optimization using matrix stepsizes

H Li, A Karagulyan, P Richtárik - 2024 - repository.kaust.edu.sa
Matrix-stepsized gradient descent algorithms have been shown to have superior
performance in non-convex optimization problems compared to their scalar counterparts …

Random-reshuffled SARAH does not need full gradient computations

A Beznosikov, M Takáč - Optimization Letters, 2024 - Springer
Abstract The StochAstic Recursive grAdient algoritHm (SARAH) algorithm is a variance
reduced variant of the Stochastic Gradient Descent algorithm that needs a gradient of the …

A Unified Model for Large-Scale Inexact Fixed-Point Iteration: A Stochastic Optimization Perspective

A Hashemi - IEEE Transactions on Automatic Control, 2024 - ieeexplore.ieee.org
Calculating fixed points of a nonlinear function is a central problem in many areas of science
and engineering with applications ranging from the study of dynamical systems to …