Katyusha: The first direct acceleration of stochastic gradient methods

Z Allen-Zhu - Journal of Machine Learning Research, 2018 - jmlr.org
Nesterov's momentum trick is famously known for accelerating gradient descent, and has
been proven useful in building fast iterative algorithms. However, in the stochastic setting …

Convex optimization: Algorithms and complexity

S Bubeck - Foundations and Trends® in Machine Learning, 2015 - nowpublishers.com
This monograph presents the main complexity theorems in convex optimization and their
corresponding algorithms. Starting from the fundamental theory of black-box optimization …

A variational perspective on accelerated methods in optimization

A Wibisono, AC Wilson, MI Jordan - … of the National Academy of Sciences, 2016 - pnas.org
Accelerated gradient methods play a central role in optimization, achieving optimal rates in
many settings. Although many generalizations and extensions of Nesterov's original …

Understanding the acceleration phenomenon via high-resolution differential equations

B Shi, SS Du, MI Jordan, WJ Su - Mathematical Programming, 2022 - Springer
Gradient-based optimization algorithms can be studied from the perspective of limiting
ordinary differential equations (ODEs). Motivated by the fact that existing ODEs do not …

Acceleration methods

A d'Aspremont, D Scieur, A Taylor - Foundations and Trends® …, 2021 - nowpublishers.com
This monograph covers some recent advances in a range of acceleration techniques
frequently used in convex optimization. We first use quadratic optimization problems to …

Acceleration by stepsize hedging: Multi-step descent and the silver stepsize schedule

J Altschuler, P Parrilo - Journal of the ACM, 2023 - dl.acm.org
Can we accelerate the convergence of gradient descent without changing the algorithm—
just by judiciously choosing stepsizes? Surprisingly, we show that the answer is yes. Our …

Accelerated methods for nonconvex optimization

Y Carmon, JC Duchi, O Hinder, A Sidford - SIAM Journal on Optimization, 2018 - SIAM
We present an accelerated gradient method for nonconvex optimization problems with
Lipschitz continuous first and second derivatives. In a time O(ϵ^-7/4\log(1/ϵ)), the method …

Accelerated gradient descent escapes saddle points faster than gradient descent

C **, P Netrapalli, MI Jordan - Conference On Learning …, 2018 - proceedings.mlr.press
Nesterov's accelerated gradient descent (AGD), an instance of the general family of
“momentum methods,” provably achieves faster convergence rate than gradient descent …

A faster cutting plane method and its implications for combinatorial and convex optimization

YT Lee, A Sidford, SC Wong - 2015 IEEE 56th Annual …, 2015 - ieeexplore.ieee.org
In this paper we improve upon the running time for finding a point in a convex set given a
separation oracle. In particular, given a separation oracle for a convex set K⊂ R n that is …

Linear coupling: An ultimate unification of gradient and mirror descent

Z Allen-Zhu, L Orecchia - arxiv preprint arxiv:1407.1537, 2014 - arxiv.org
First-order methods play a central role in large-scale machine learning. Even though many
variations exist, each suited to a particular problem, almost all such methods fundamentally …