Advances in asynchronous parallel and distributed optimization

M Assran, A Aytekin, HR Feyzmahdavian… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Motivated by large-scale optimization problems arising in the context of machine learning,
there have been several advances in the study of asynchronous parallel and distributed …

First analysis of local GD on heterogeneous data

A Khaled, K Mishchenko, P Richtárik - arxiv preprint arxiv:1909.04715, 2019 - arxiv.org
We provide the first convergence analysis of local gradient descent for minimizing the
average of smooth and convex but otherwise arbitrary functions. Problems of this form and …

Asynchronous SGD beats minibatch SGD under arbitrary delays

K Mishchenko, F Bach, M Even… - Advances in Neural …, 2022 - proceedings.neurips.cc
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades
dramatically when any delay is large, giving the impression that performance depends …

Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction

AC Cullen, BIP Rubinstein, S Kandeepan… - Artificial Intelligence …, 2023 - Springer
The advent of the Internet of Things and 5G has further accelerated the growth in devices
attempting to gain access to the wireless spectrum. A consequence of this has been the …

Asynchronous sgd on graphs: a unified framework for asynchronous decentralized and federated optimization

M Even, A Koloskova… - … Conference on Artificial …, 2024 - proceedings.mlr.press
Decentralized and asynchronous communications are two popular techniques to speedup
communication complexity of distributed machine learning, by respectively removing the …

Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates

D Kovalev, K Mishchenko, P Richtárik - arxiv preprint arxiv:1912.01597, 2019 - arxiv.org
We present two new remarkably simple stochastic second-order methods for minimizing the
average of a very large number of sufficiently smooth and strongly convex functions. The first …

Moshpit sgd: Communication-efficient decentralized training on heterogeneous unreliable devices

M Ryabinin, E Gorbunov… - Advances in …, 2021 - proceedings.neurips.cc
Training deep neural networks on large datasets can often be accelerated by using multiple
compute nodes. This approach, known as distributed training, can utilize hundreds of …

Optimal time complexities of parallel stochastic optimization methods under a fixed computation model

A Tyurin, P Richtárik - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Parallelization is a popular strategy for improving the performance of methods. Optimization
methods are no exception: design of efficient parallel optimization methods and tight …

Adaptive catalyst for smooth convex optimization

A Ivanova, D Pasechnyuk, D Grishchenko… - … on Optimization and …, 2021 - Springer
In this paper, we present a generic framework that allows accelerating almost arbitrary non-
accelerated deterministic and randomized algorithms for smooth convex optimization …

Robust distributed accelerated stochastic gradient methods for multi-agent networks

A Fallah, M Gürbüzbalaban, A Ozdaglar… - Journal of machine …, 2022 - jmlr.org
We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG)
for solving decentralized strongly convex stochastic optimization problems where the …