SGD: General analysis and improved rates

RM Gower, N Loizou, X Qian… - International …, 2019 - proceedings.mlr.press
We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

Not all samples are created equal: Deep learning with importance sampling

A Katharopoulos, F Fleuret - International conference on …, 2018 - proceedings.mlr.press
Abstract Deep Neural Network training spends most of the computation on examples that
are properly handled, and could be ignored. We propose to mitigate this phenomenon with a …

Parallel coordinate descent methods for big data optimization

P Richtárik, M Takáč - Mathematical Programming, 2016 - Springer
In this work we show that randomized (block) coordinate descent methods can be
accelerated by parallelization when applied to the problem of minimizing the sum of a …

Accelerated, parallel, and proximal coordinate descent

O Fercoq, P Richtárik - SIAM Journal on Optimization, 2015 - SIAM
We propose a new randomized coordinate descent method for minimizing the sum of
convex functions each of which depends on a small number of coordinates only. Our method …

A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent

E Gorbunov, F Hanzely… - … Conference on Artificial …, 2020 - proceedings.mlr.press
In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …

Distributed optimization with arbitrary local solvers

C Ma, J Konečný, M Jaggi, V Smith… - optimization Methods …, 2017 - Taylor & Francis
With the growth of data and necessity for distributed optimization methods, solvers that work
well on a single machine must be re-designed to leverage distributed computation. Recent …

Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications

A Chambolle, MJ Ehrhardt, P Richtárik… - SIAM Journal on …, 2018 - SIAM
We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by
Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual …

Adding vs. averaging in distributed primal-dual optimization

C Ma, V Smith, M Jaggi, M Jordan… - International …, 2015 - proceedings.mlr.press
Distributed optimization methods for large-scale machine learning suffer from a
communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and …

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arxiv preprint arxiv …, 2022 - arxiv.org
Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity

N Loizou, H Berard, G Gidel… - Advances in …, 2021 - proceedings.neurips.cc
Two of the most prominent algorithms for solving unconstrained smooth games are the
classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic …