Convergence of adam under relaxed assumptions

H Li, A Rakhlin, A Jadbabaie - Advances in Neural …, 2023 - proceedings.neurips.cc
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …

EF21: A new, simpler, theoretically better, and practically faster error feedback

P Richtárik, I Sokolov… - Advances in Neural …, 2021 - proceedings.neurips.cc
Error feedback (EF), also known as error compensation, is an immensely popular
convergence stabilization mechanism in the context of distributed training of supervised …

Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction

X Jiang, SU Stich - Advances in Neural Information …, 2024 - proceedings.neurips.cc
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …

MARINA: Faster non-convex distributed learning with compression

E Gorbunov, KP Burlachenko, Z Li… - … on Machine Learning, 2021 - proceedings.mlr.press
We develop and analyze MARINA: a new communication efficient method for non-convex
distributed learning over heterogeneous datasets. MARINA employs a novel communication …

Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies

I Fatkhullin, A Barakat, A Kireeva… - … Conference on Machine …, 2023 - proceedings.mlr.press
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …

Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo

K Balasubramanian, S Chewi… - … on Learning Theory, 2022 - proceedings.mlr.press
For the task of sampling from a density $\pi\propto\exp (-V) $ on $\R^ d $, where $ V $ is
possibly non-convex but $ L $-gradient Lipschitz, we prove that averaged Langevin Monte …

SoteriaFL: A unified framework for private federated learning with communication compression

Z Li, H Zhao, B Li, Y Chi - Advances in Neural Information …, 2022 - proceedings.neurips.cc
To enable large-scale machine learning in bandwidth-hungry environments such as
wireless networks, significant progress has been made recently in designing communication …

The complexity of nonconvex-strongly-concave minimax optimization

S Zhang, J Yang, C Guzmán… - Uncertainty in …, 2021 - proceedings.mlr.press
This paper studies the complexity for finding approximate stationary points of nonconvex-
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …

A novel framework for policy mirror descent with general parameterization and linear convergence

C Alfano, R Yuan, P Rebeschini - Advances in Neural …, 2023 - proceedings.neurips.cc
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arxiv preprint arxiv …, 2022 - arxiv.org
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …