- Academic Search

Lagre Referanse Sitert av 2428 Beslektede artikler Alle 8 versjoner HTML-versjon

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arxiv preprint arxiv …, 2016 - arxiv.org

We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

Lagre Referanse Sitert av 723 Beslektede artikler Alle 11 versjoner HTML-versjon

Gradient sparsification for communication-efficient distributed optimization

J Wangni, J Wang, J Liu… - Advances in Neural …, 2018 - proceedings.neurips.cc

Modern large-scale machine learning applications require stochastic optimization
algorithms to be implemented on distributed computational architectures. A key bottleneck is …

Lagre Referanse Sitert av 4267 Beslektede artikler Alle 21 versjoner

[PDF] siam.org

Optimization methods for large-scale machine learning

L Bottou, FE Curtis, J Nocedal - SIAM review, 2018 - SIAM

This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …

Lagre Referanse Sitert av 419 Beslektede artikler Alle 7 versjoner HTML-versjon

Atomo: Communication-efficient learning via atomic sparsification

H Wang, S Sievert, S Liu, Z Charles… - Advances in neural …, 2018 - proceedings.neurips.cc

Distributed model training suffers from communication overheads due to frequent gradient
updates transmitted between compute nodes. To mitigate these overheads, several studies …

Lagre Referanse Sitert av 1424 Beslektede artikler Alle 10 versjoner

Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

H Karimi, J Nutini, M Schmidt - Joint European conference on machine …, 2016 - Springer

In 1963, Polyak proposed a simple condition that is sufficient to show a global linear
convergence rate for gradient descent. This condition is a special case of the Łojasiewicz …

Lagre Referanse Sitert av 373 Beslektede artikler Alle 13 versjoner HTML-versjon

LAG: Lazily aggregated gradient for communication-efficient distributed learning

T Chen, G Giannakis, T Sun… - Advances in neural …, 2018 - proceedings.neurips.cc

This paper presents a new class of gradient methods for distributed machine learning that
adaptively skip the gradient calculations to learn with reduced communication and …

Lagre Referanse Sitert av 1785 Beslektede artikler Alle 10 versjoner

Coordinate descent algorithms

SJ Wright - Mathematical programming, 2015 - Springer

Coordinate descent algorithms solve optimization problems by successively performing
approximate minimization along coordinate directions or coordinate hyperplanes. They have …

Lagre Referanse Sitert av 583 Beslektede artikler Alle 7 versjoner HTML-versjon

Asynchronous parallel stochastic gradient for nonconvex optimization

X Lian, Y Huang, Y Li, J Liu - Advances in neural …, 2015 - proceedings.neurips.cc

The asynchronous parallel implementations of stochastic gradient (SG) have been broadly
used in solving deep neural network and received many successes in practice recently …