Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator

C Fang, CJ Li, Z Lin, T Zhang - Advances in neural …, 2018 - proceedings.neurips.cc
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …

Lower bounds for non-convex stochastic optimization

Y Arjevani, Y Carmon, JC Duchi, DJ Foster… - Mathematical …, 2023 - Springer
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …

Momentum-based variance reduction in non-convex sgd

A Cutkosky, F Orabona - Advances in neural information …, 2019 - proceedings.neurips.cc
Variance reduction has emerged in recent years as a strong competitor to stochastic
gradient descent in non-convex problems, providing the first algorithms to improve upon the …

Provably faster algorithms for bilevel optimization

J Yang, K Ji, Y Liang - Advances in Neural Information …, 2021 - proceedings.neurips.cc
Bilevel optimization has been widely applied in many important machine learning
applications such as hyperparameter optimization and meta-learning. Recently, several …

Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion

A Cutkosky, H Mehta… - … Conference on Machine …, 2023 - proceedings.mlr.press
We present new algorithms for optimizing non-smooth, non-convex stochastic objectives
based on a novel analysis technique. This improves the current best-known complexity for …

Adagrad stepsizes: Sharp convergence over nonconvex landscapes

R Ward, X Wu, L Bottou - Journal of Machine Learning Research, 2020 - jmlr.org
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …

A near-optimal algorithm for stochastic bilevel optimization via double-momentum

P Khanduri, S Zeng, M Hong, HT Wai… - Advances in neural …, 2021 - proceedings.neurips.cc
This paper proposes a new algorithm--the\underline {S} ingle-timescale Do\underline {u} ble-
momentum\underline {St} ochastic\underline {A} pprox\underline {i} matio\underline …

PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization

Z Li, H Bao, X Zhang… - … conference on machine …, 2021 - proceedings.mlr.press
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …

[BOEK][B] First-order and stochastic optimization methods for machine learning

G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …