A survey of distributed optimization

T Yang, X Yi, J Wu, Y Yuan, D Wu, Z Meng… - Annual Reviews in …, 2019 - Elsevier
In distributed optimization of multi-agent systems, agents cooperate to minimize a global
function which is a sum of local objective functions. Motivated by applications including …

A review of nonlinear FFT-based computational homogenization methods

M Schneider - Acta Mechanica, 2021 - Springer
Since their inception, computational homogenization methods based on the fast Fourier
transform (FFT) have grown in popularity, establishing themselves as a powerful tool …

Lookahead optimizer: k steps forward, 1 step back

M Zhang, J Lucas, J Ba… - Advances in neural …, 2019 - proceedings.neurips.cc
The vast majority of successful deep neural networks are trained using variants of stochastic
gradient descent (SGD) algorithms. Recent attempts to improve SGD can be broadly …

Train faster, generalize better: Stability of stochastic gradient descent

M Hardt, B Recht, Y Singer - International conference on …, 2016 - proceedings.mlr.press
We show that parametric models trained by a stochastic gradient method (SGM) with few
iterations have vanishing generalization error. We prove our results by arguing that SGM is …

An introduction to continuous optimization for imaging

A Chambolle, T Pock - Acta Numerica, 2016 - cambridge.org
A large number of imaging problems reduce to the optimization of a cost function, with
typical structural properties. The aim of this paper is to describe the state of the art in …

A differential equation for modeling Nesterov's accelerated gradient method: Theory and insights

W Su, S Boyd, EJ Candes - Journal of Machine Learning Research, 2016 - jmlr.org
We derive a second-order ordinary differential equation (ODE) which is the limit of
Nesterov's accelerated gradient method. This ODE exhibits approximate equivalence to …

A variational perspective on accelerated methods in optimization

A Wibisono, AC Wilson, MI Jordan - … of the National Academy of Sciences, 2016 - pnas.org
Accelerated gradient methods play a central role in optimization, achieving optimal rates in
many settings. Although many generalizations and extensions of Nesterov's original …

Understanding the acceleration phenomenon via high-resolution differential equations

B Shi, SS Du, MI Jordan, WJ Su - Mathematical Programming, 2022 - Springer
Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …

Federated accelerated stochastic gradient descent

H Yuan, T Ma - Advances in Neural Information Processing …, 2020 - proceedings.neurips.cc
Abstract We propose Federated Accelerated Stochastic Gradient Descent (FedAc), a
principled acceleration of Federated Averaging (FedAvg, also known as Local SGD) for …

The limit points of (optimistic) gradient descent in min-max optimization

C Daskalakis, I Panageas - Advances in neural information …, 2018 - proceedings.neurips.cc
Motivated by applications in Optimization, Game Theory, and the training of Generative
Adversarial Networks, the convergence properties of first order methods in min-max …