Variance-reduced methods for machine learning
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Lower bounds for non-convex stochastic optimization
We lower bound the complexity of finding ϵ-stationary points (with gradient norm at most ϵ)
using stochastic first-order methods. In a well-studied model where algorithms access …
using stochastic first-order methods. In a well-studied model where algorithms access …
Momentum-based variance reduction in non-convex sgd
A Cutkosky, F Orabona - Advances in neural information …, 2019 - proceedings.neurips.cc
Variance reduction has emerged in recent years as a strong competitor to stochastic
gradient descent in non-convex problems, providing the first algorithms to improve upon the …
gradient descent in non-convex problems, providing the first algorithms to improve upon the …
Provably faster algorithms for bilevel optimization
Bilevel optimization has been widely applied in many important machine learning
applications such as hyperparameter optimization and meta-learning. Recently, several …
applications such as hyperparameter optimization and meta-learning. Recently, several …
Optimal stochastic non-smooth non-convex optimization through online-to-non-convex conversion
A Cutkosky, H Mehta… - … Conference on Machine …, 2023 - proceedings.mlr.press
We present new algorithms for optimizing non-smooth, non-convex stochastic objectives
based on a novel analysis technique. This improves the current best-known complexity for …
based on a novel analysis technique. This improves the current best-known complexity for …
Adagrad stepsizes: Sharp convergence over nonconvex landscapes
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …
stochastic gradient descent on the fly according to the gradients received along the way; …
A near-optimal algorithm for stochastic bilevel optimization via double-momentum
This paper proposes a new algorithm--the\underline {S} ingle-timescale Do\underline {u} ble-
momentum\underline {St} ochastic\underline {A} pprox\underline {i} matio\underline …
momentum\underline {St} ochastic\underline {A} pprox\underline {i} matio\underline …
PAGE: A simple and optimal probabilistic gradient estimator for nonconvex optimization
In this paper, we propose a novel stochastic gradient estimator—ProbAbilistic Gradient
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
Estimator (PAGE)—for nonconvex optimization. PAGE is easy to implement as it is designed …
[BOEK][B] First-order and stochastic optimization methods for machine learning
G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …
solution methods for many statistical and machine learning models rely on optimization. The …