Asynchronous parallel stochastic gradient for nonconvex optimization

X Lian, Y Huang, Y Li, J Liu - Advances in neural …, 2015 - proceedings.neurips.cc
The asynchronous parallel implementations of stochastic gradient (SG) have been broadly
used in solving deep neural network and received many successes in practice recently …

Taming the wild: A unified analysis of hogwild-style algorithms

CM De Sa, C Zhang, K Olukotun… - Advances in neural …, 2015 - proceedings.neurips.cc
Stochastic gradient descent (SGD) is a ubiquitous algorithm for a variety of machine learning
problems. Researchers and industry have developed several techniques to optimize SGD's …

Adding vs. averaging in distributed primal-dual optimization

C Ma, V Smith, M Jaggi, M Jordan… - International …, 2015 - proceedings.mlr.press
Distributed optimization methods for large-scale machine learning suffer from a
communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and …

Stochastic quasi-gradient methods: Variance reduction via Jacobian sketching

RM Gower, P Richtárik, F Bach - Mathematical Programming, 2021 - Springer
We develop a new family of variance reduced stochastic gradient descent methods for
minimizing the average of a very large number of smooth functions. Our method—JacSketch …

A primer on coordinate descent algorithms

HJM Shi, S Tu, Y Xu, W Yin - arxiv preprint arxiv:1610.00040, 2016 - arxiv.org
This monograph presents a class of algorithms called coordinate descent algorithms for
mathematicians, statisticians, and engineers outside the field of optimization. This particular …

A comprehensive linear speedup analysis for asynchronous stochastic parallel optimization from zeroth-order to first-order

X Lian, H Zhang, CJ Hsieh… - Advances in Neural …, 2016 - proceedings.neurips.cc
Asynchronous parallel optimization received substantial successes and extensive attention
recently. One of core theoretical questions is how much speedup (or benefit) the …

A parallel computing approach to solve traffic assignment using path-based gradient projection algorithm

X Chen, Z Liu, K Zhang, Z Wang - Transportation Research Part C …, 2020 - Elsevier
This paper presents a Parallel Block-Coordinate Descent (PBCD) algorithm for solving the
user equilibrium traffic assignment problem. Most of the existing algorithms for the user …

Distributed asynchronous optimization with unbounded delays: How slow can you go?

Z Zhou, P Mertikopoulos, N Bambos… - International …, 2018 - proceedings.mlr.press
One of the most widely used optimization methods for large-scale machine learning
problems is distributed asynchronous stochastic gradient descent (DASGD). However, a key …

Distributed multi-task relationship learning

S Liu, SJ Pan, Q Ho - Proceedings of the 23rd ACM SIGKDD …, 2017 - dl.acm.org
Multi-task learning aims to learn multiple tasks jointly by exploiting their relatedness to
improve the generalization performance for each task. Traditionally, to perform multi-task …

Fastest rates for stochastic mirror descent methods

F Hanzely, P Richtárik - Computational Optimization and Applications, 2021 - Springer
Relative smoothness—a notion introduced in Birnbaum et al.(Proceedings of the 12th ACM
conference on electronic commerce, ACM, pp 127–136, 2011) and recently rediscovered in …