Advances in asynchronous parallel and distributed optimization
Motivated by large-scale optimization problems arising in the context of machine learning,
there have been several advances in the study of asynchronous parallel and distributed …
there have been several advances in the study of asynchronous parallel and distributed …
An online and unified algorithm for projection matrix vector multiplication with application to empirical risk minimization
Online matrix vector multiplication is a fundamental step and bottleneck in many machine
learning algorithms. It is defined as follows: given a matrix at the pre-processing phase, at …
learning algorithms. It is defined as follows: given a matrix at the pre-processing phase, at …
Acceleration for compressed gradient descent in distributed and federated optimization
Due to the high communication cost in distributed and federated learning problems,
methods relying on compression of communicated messages are becoming increasingly …
methods relying on compression of communicated messages are becoming increasingly …
Local sgd: Unified theory and new efficient methods
We present a unified framework for analyzing local SGD methods in the convex and strongly
convex regimes for distributed/federated training of supervised machine learning models …
convex regimes for distributed/federated training of supervised machine learning models …
A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent
In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …
stochastic gradient descent (SGD) which so far have required different intuitions …
Gradients without backpropagation
Using backpropagation to compute gradients of objective functions for optimization has
remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation …
remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation …
Linearly converging error compensated SGD
In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary
compressions and delayed updates. Our framework is general enough to cover different …
compressions and delayed updates. Our framework is general enough to cover different …
Stochastic gradient descent-ascent: Unified theory and new efficient methods
Abstract Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities problems (VIP) …
algorithms for solving min-max optimization and variational inequalities problems (VIP) …
Efficient sgd neural network training via sublinear activated neuron identification
Deep learning has been widely used in many fields, but the model training process usually
consumes massive computational resources and time. Therefore, designing an efficient …
consumes massive computational resources and time. Therefore, designing an efficient …
A hybrid stochastic optimization framework for composite nonconvex optimization
We introduce a new approach to develop stochastic optimization algorithms for a class of
stochastic composite and possibly nonconvex optimization problems. The main idea is to …
stochastic composite and possibly nonconvex optimization problems. The main idea is to …