Advances in asynchronous parallel and distributed optimization
Motivated by large-scale optimization problems arising in the context of machine learning,
there have been several advances in the study of asynchronous parallel and distributed …
there have been several advances in the study of asynchronous parallel and distributed …
First analysis of local GD on heterogeneous data
We provide the first convergence analysis of local gradient descent for minimizing the
average of smooth and convex but otherwise arbitrary functions. Problems of this form and …
average of smooth and convex but otherwise arbitrary functions. Problems of this form and …
Asynchronous SGD beats minibatch SGD under arbitrary delays
The existing analysis of asynchronous stochastic gradient descent (SGD) degrades
dramatically when any delay is large, giving the impression that performance depends …
dramatically when any delay is large, giving the impression that performance depends …
Predicting dynamic spectrum allocation: a review covering simulation, modelling, and prediction
The advent of the Internet of Things and 5G has further accelerated the growth in devices
attempting to gain access to the wireless spectrum. A consequence of this has been the …
attempting to gain access to the wireless spectrum. A consequence of this has been the …
Asynchronous sgd on graphs: a unified framework for asynchronous decentralized and federated optimization
Decentralized and asynchronous communications are two popular techniques to speedup
communication complexity of distributed machine learning, by respectively removing the …
communication complexity of distributed machine learning, by respectively removing the …
Stochastic Newton and cubic Newton methods with simple local linear-quadratic rates
We present two new remarkably simple stochastic second-order methods for minimizing the
average of a very large number of sufficiently smooth and strongly convex functions. The first …
average of a very large number of sufficiently smooth and strongly convex functions. The first …
Moshpit sgd: Communication-efficient decentralized training on heterogeneous unreliable devices
Training deep neural networks on large datasets can often be accelerated by using multiple
compute nodes. This approach, known as distributed training, can utilize hundreds of …
compute nodes. This approach, known as distributed training, can utilize hundreds of …
Optimal time complexities of parallel stochastic optimization methods under a fixed computation model
Parallelization is a popular strategy for improving the performance of methods. Optimization
methods are no exception: design of efficient parallel optimization methods and tight …
methods are no exception: design of efficient parallel optimization methods and tight …
Adaptive catalyst for smooth convex optimization
In this paper, we present a generic framework that allows accelerating almost arbitrary non-
accelerated deterministic and randomized algorithms for smooth convex optimization …
accelerated deterministic and randomized algorithms for smooth convex optimization …
Robust distributed accelerated stochastic gradient methods for multi-agent networks
We study distributed stochastic gradient (D-SG) method and its accelerated variant (D-ASG)
for solving decentralized strongly convex stochastic optimization problems where the …
for solving decentralized strongly convex stochastic optimization problems where the …