SGD: General analysis and improved rates
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …
Not all samples are created equal: Deep learning with importance sampling
Abstract Deep Neural Network training spends most of the computation on examples that
are properly handled, and could be ignored. We propose to mitigate this phenomenon with a …
are properly handled, and could be ignored. We propose to mitigate this phenomenon with a …
Parallel coordinate descent methods for big data optimization
In this work we show that randomized (block) coordinate descent methods can be
accelerated by parallelization when applied to the problem of minimizing the sum of a …
accelerated by parallelization when applied to the problem of minimizing the sum of a …
Accelerated, parallel, and proximal coordinate descent
We propose a new randomized coordinate descent method for minimizing the sum of
convex functions each of which depends on a small number of coordinates only. Our method …
convex functions each of which depends on a small number of coordinates only. Our method …
A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent
In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …
stochastic gradient descent (SGD) which so far have required different intuitions …
Distributed optimization with arbitrary local solvers
With the growth of data and necessity for distributed optimization methods, solvers that work
well on a single machine must be re-designed to leverage distributed computation. Recent …
well on a single machine must be re-designed to leverage distributed computation. Recent …
Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications
We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by
Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual …
Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual …
Adding vs. averaging in distributed primal-dual optimization
Distributed optimization methods for large-scale machine learning suffer from a
communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and …
communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and …
Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …
collaborative and federated learning. However, many fruitful directions, such as the usage of …
Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity
Two of the most prominent algorithms for solving unconstrained smooth games are the
classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic …
classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic …