Convergence of adam under relaxed assumptions
In this paper, we provide a rigorous proof of convergence of the Adaptive Moment Estimate
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …
(Adam) algorithm for a wide class of optimization objectives. Despite the popularity and …
EF21: A new, simpler, theoretically better, and practically faster error feedback
P Richtárik, I Sokolov… - Advances in Neural …, 2021 - proceedings.neurips.cc
Error feedback (EF), also known as error compensation, is an immensely popular
convergence stabilization mechanism in the context of distributed training of supervised …
convergence stabilization mechanism in the context of distributed training of supervised …
Adaptive SGD with Polyak stepsize and line-search: Robust convergence and variance reduction
The recently proposed stochastic Polyak stepsize (SPS) and stochastic line-search (SLS) for
SGD have shown remarkable effectiveness when training over-parameterized models …
SGD have shown remarkable effectiveness when training over-parameterized models …
MARINA: Faster non-convex distributed learning with compression
We develop and analyze MARINA: a new communication efficient method for non-convex
distributed learning over heterogeneous datasets. MARINA employs a novel communication …
distributed learning over heterogeneous datasets. MARINA employs a novel communication …
Stochastic policy gradient methods: Improved sample complexity for fisher-non-degenerate policies
Recently, the impressive empirical success of policy gradient (PG) methods has catalyzed
the development of their theoretical foundations. Despite the huge efforts directed at the …
the development of their theoretical foundations. Despite the huge efforts directed at the …
Towards a theory of non-log-concave sampling: first-order stationarity guarantees for langevin monte carlo
K Balasubramanian, S Chewi… - … on Learning Theory, 2022 - proceedings.mlr.press
For the task of sampling from a density $\pi\propto\exp (-V) $ on $\R^ d $, where $ V $ is
possibly non-convex but $ L $-gradient Lipschitz, we prove that averaged Langevin Monte …
possibly non-convex but $ L $-gradient Lipschitz, we prove that averaged Langevin Monte …
SoteriaFL: A unified framework for private federated learning with communication compression
To enable large-scale machine learning in bandwidth-hungry environments such as
wireless networks, significant progress has been made recently in designing communication …
wireless networks, significant progress has been made recently in designing communication …
The complexity of nonconvex-strongly-concave minimax optimization
This paper studies the complexity for finding approximate stationary points of nonconvex-
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …
strongly-concave (NC-SC) smooth minimax problems, in both general and averaged smooth …
A novel framework for policy mirror descent with general parameterization and linear convergence
Modern policy optimization methods in reinforcement learning, such as TRPO and PPO, owe
their success to the use of parameterized policies. However, while theoretical guarantees …
their success to the use of parameterized policies. However, while theoretical guarantees …
Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …
collaborative and federated learning. However, many fruitful directions, such as the usage of …