Variance-reduced methods for machine learning

RM Gower, M Schmidt, F Bach… - Proceedings of the …, 2020 - ieeexplore.ieee.org
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …

Federated optimization: Distributed machine learning for on-device intelligence

J Konečný, HB McMahan, D Ramage… - arxiv preprint arxiv …, 2016 - arxiv.org
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …

Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence

N Loizou, S Vaswani, IH Laradji… - International …, 2021 - proceedings.mlr.press
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …

Non-convex finite-sum optimization via scsg methods

L Lei, C Ju, J Chen, MI Jordan - Advances in Neural …, 2017 - proceedings.neurips.cc
We develop a class of algorithms, as variants of the stochastically controlled stochastic
gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only …

Online batch selection for faster training of neural networks

I Loshchilov, F Hutter - arxiv preprint arxiv:1511.06343, 2015 - arxiv.org
Deep neural networks are commonly trained using stochastic non-convex optimization
procedures, which are driven by gradient information estimated on fractions (batches) of the …

[KNIHA][B] Optimization for machine learning

S Sra, S Nowozin, SJ Wright - 2011 - books.google.com
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …

Stochastic nested variance reduction for nonconvex optimization

D Zhou, P Xu, Q Gu - Journal of machine learning research, 2020 - jmlr.org
We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …

Stochastic variance-reduced policy gradient

M Papini, D Binaghi, G Canonaco… - International …, 2018 - proceedings.mlr.press
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …

Barzilai-borwein step size for stochastic gradient descent

C Tan, S Ma, YH Dai, Y Qian - Advances in neural …, 2016 - proceedings.neurips.cc
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an
appropriate step size while running the algorithm. Since the traditional line search technique …

Stochastic variance reduction methods for saddle-point problems

B Palaniappan, F Bach - Advances in Neural Information …, 2016 - proceedings.neurips.cc
We consider convex-concave saddle-point problems where the objective functions may be
split in many components, and extend recent stochastic variance reduction methods (such …