Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2024 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

Generalized-smooth nonconvex optimization is as efficient as smooth nonconvex optimization

Z Chen, Y Zhou, Y Liang, Z Lu - International Conference on …, 2023 - proceedings.mlr.press
Various optimal gradient-based algorithms have been developed for smooth nonconvex
optimization. However, many nonconvex machine learning problems do not belong to the …

Federated learning with client subsampling, data heterogeneity, and unbounded smoothness: A new algorithm and lower bounds

M Crawshaw, Y Bao, M Liu - Advances in Neural …, 2024 - proceedings.neurips.cc
We study the problem of Federated Learning (FL) under client subsampling and data
heterogeneity with an objective function that has potentially unbounded smoothness. This …

Adam-family methods for nonsmooth optimization with convergence guarantees

N ** and communication compression
B Li, Y Chi - IEEE Journal of Selected Topics in Signal …, 2025 - ieeexplore.ieee.org
Achieving communication efficiency in decentralized machine learning has been attracting
significant attention, with communication compression recognized as an effective technique …

Gradient-variation online learning under generalized smoothness

YF **e, P Zhao, ZH Zhou - arxiv preprint arxiv:2408.09074, 2024 - arxiv.org
Gradient-variation online learning aims to achieve regret guarantees that scale with
variations in the gradients of online functions, which has been shown to be crucial for …

Error Feedback under -Smoothness: Normalization and Momentum

S Khirirat, A Sadiev, A Riabinin, E Gorbunov… - arxiv preprint arxiv …, 2024 - arxiv.org
We provide the first proof of convergence for normalized error feedback algorithms across a
wide range of machine learning problems. Despite their popularity and efficiency in training …