- Academic Search

RM Gower, N Loizou, X Qian… - International …, 2019 - proceedings.mlr.press

We propose a general yet simple theorem describing the convergence of SGD under the
arbitrary sampling paradigm. Our theorem describes the convergence of an infinite array of …

保存引用被引用次数：498 相关文章所有 17 个版本 HTML 版

[Free GPT-4]

[PDF] mlr.press

Not all samples are created equal: Deep learning with importance sampling

A Katharopoulos, F Fleuret - International conference on …, 2018 - proceedings.mlr.press

Abstract Deep Neural Network training spends most of the computation on examples that
are properly handled, and could be ignored. We propose to mitigate this phenomenon with a …

保存引用被引用次数：620 相关文章所有 12 个版本 HTML 版

[Free GPT-4]

[PDF] springer.com

Parallel coordinate descent methods for big data optimization

P Richtárik, M Takáč - Mathematical Programming, 2016 - Springer

In this work we show that randomized (block) coordinate descent methods can be
accelerated by parallelization when applied to the problem of minimizing the sum of a …

保存引用被引用次数：562 相关文章所有 18 个版本

[Free GPT-4]

[PDF] siam.org

Accelerated, parallel, and proximal coordinate descent

O Fercoq, P Richtárik - SIAM Journal on Optimization, 2015 - SIAM

We propose a new randomized coordinate descent method for minimizing the sum of
convex functions each of which depends on a small number of coordinates only. Our method …

保存引用被引用次数：423 相关文章所有 16 个版本

[Free GPT-4]

[PDF] mlr.press

A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent

E Gorbunov, F Hanzely… - … Conference on Artificial …, 2020 - proceedings.mlr.press

In this paper we introduce a unified analysis of a large family of variants of proximal
stochastic gradient descent (SGD) which so far have required different intuitions …

保存引用被引用次数：175 相关文章所有 12 个版本 HTML 版

[Free GPT-4]

[PDF] tandfonline.com

Distributed optimization with arbitrary local solvers

C Ma, J Konečný, M Jaggi, V Smith… - optimization Methods …, 2017 - Taylor & Francis

With the growth of data and necessity for distributed optimization methods, solvers that work
well on a single machine must be re-designed to leverage distributed computation. Recent …

保存引用被引用次数：238 相关文章所有 17 个版本

[Free GPT-4]

[PDF] siam.org

Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications

A Chambolle, MJ Ehrhardt, P Richtárik… - SIAM Journal on …, 2018 - SIAM

We propose a stochastic extension of the primal-dual hybrid gradient algorithm studied by
Chambolle and Pock in 2011 to solve saddle point problems that are separable in the dual …

保存引用被引用次数：209 相关文章所有 15 个版本

[Free GPT-4]

[PDF] mlr.press

Adding vs. averaging in distributed primal-dual optimization

C Ma, V Smith, M Jaggi, M Jordan… - International …, 2015 - proceedings.mlr.press

Distributed optimization methods for large-scale machine learning suffer from a
communication bottleneck. It is difficult to reduce this bottleneck while still efficiently and …

保存引用被引用次数：211 相关文章所有 17 个版本 HTML 版

[Free GPT-4]

[PDF] arxiv.org

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

E Gorbunov, S Horváth, P Richtárik, G Gidel - arxiv preprint arxiv …, 2022 - arxiv.org

Byzantine-robustness has been gaining a lot of attention due to the growth of the interest in
collaborative and federated learning. However, many fruitful directions, such as the usage of …

保存引用被引用次数：45 相关文章所有 7 个版本 HTML 版

[Free GPT-4]

[PDF] neurips.cc

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity

N Loizou, H Berard, G Gidel… - Advances in …, 2021 - proceedings.neurips.cc

Two of the most prominent algorithms for solving unconstrained smooth games are the
classical stochastic gradient descent-ascent (SGDA) and the recently introduced stochastic …

保存引用被引用次数：59 相关文章所有 7 个版本 HTML 版

引用

高级搜索

已保存到“我的图书馆”

SGD: General analysis and improved rates

Not all samples are created equal: Deep learning with importance sampling

Parallel coordinate descent methods for big data optimization

Accelerated, parallel, and proximal coordinate descent

A unified theory of SGD: Variance reduction, sampling, quantization and coordinate descent

Distributed optimization with arbitrary local solvers

Stochastic primal-dual hybrid gradient algorithm with arbitrary sampling and imaging applications

Adding vs. averaging in distributed primal-dual optimization

Variance reduction is an antidote to byzantines: Better rates, weaker assumptions and communication compression as a cherry on the top

Stochastic gradient descent-ascent and consensus optimization for smooth games: Convergence analysis under expected co-coercivity