Advances in asynchronous parallel and distributed optimization

M Assran, A Aytekin, HR Feyzmahdavian… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Motivated by large-scale optimization problems arising in the context of machine learning,
there have been several advances in the study of asynchronous parallel and distributed …

An online and unified algorithm for projection matrix vector multiplication with application to empirical risk minimization

L Qin, Z Song, L Zhang, D Zhuo - … Conference on Artificial …, 2023 - proceedings.mlr.press
Online matrix vector multiplication is a fundamental step and bottleneck in many machine
learning algorithms. It is defined as follows: given a matrix at the pre-processing phase, at …

Acceleration for compressed gradient descent in distributed and federated optimization

Z Li, D Kovalev, X Qian, P Richtárik - arxiv preprint arxiv:2002.11364, 2020 - arxiv.org
Due to the high communication cost in distributed and federated learning problems,
methods relying on compression of communicated messages are becoming increasingly …

Local sgd: Unified theory and new efficient methods

E Gorbunov, F Hanzely… - … Conference on Artificial …, 2021 - proceedings.mlr.press
We present a unified framework for analyzing local SGD methods in the convex and strongly
convex regimes for distributed/federated training of supervised machine learning models …

A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent

E Gorbunov, F Hanzely… - … Conference on Artificial …, 2020 - proceedings.mlr.press
In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …

Gradients without backpropagation

AG Baydin, BA Pearlmutter, D Syme, F Wood… - arxiv preprint arxiv …, 2022 - arxiv.org
Using backpropagation to compute gradients of objective functions for optimization has
remained a mainstay of machine learning. Backpropagation, or reverse-mode differentiation …

Linearly converging error compensated SGD

E Gorbunov, D Kovalev… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper, we propose a unified analysis of variants of distributed SGD with arbitrary
compressions and delayed updates. Our framework is general enough to cover different …

Stochastic gradient descent-ascent: Unified theory and new efficient methods

A Beznosikov, E Gorbunov… - International …, 2023 - proceedings.mlr.press
Abstract Stochastic Gradient Descent-Ascent (SGDA) is one of the most prominent
algorithms for solving min-max optimization and variational inequalities problems (VIP) …

Efficient sgd neural network training via sublinear activated neuron identification

L Qin, Z Song, Y Yang - 2024 IEEE International Conference on …, 2024 - ieeexplore.ieee.org
Deep learning has been widely used in many fields, but the model training process usually
consumes massive computational resources and time. Therefore, designing an efficient …

A hybrid stochastic optimization framework for composite nonconvex optimization

Q Tran-Dinh, NH Pham, DT Phan… - Mathematical Programming, 2022 - Springer
We introduce a new approach to develop stochastic optimization algorithms for a class of
stochastic composite and possibly nonconvex optimization problems. The main idea is to …