Variance-reduced methods for machine learning
Stochastic optimization lies at the heart of machine learning, and its cornerstone is
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
stochastic gradient descent (SGD), a method introduced over 60 years ago. The last eight …
Federated optimization: Distributed machine learning for on-device intelligence
We introduce a new and increasingly relevant setting for distributed optimization in machine
learning, where the data defining the optimization are unevenly distributed over an …
learning, where the data defining the optimization are unevenly distributed over an …
Stochastic polyak step-size for sgd: An adaptive learning rate for fast convergence
We propose a stochastic variant of the classical Polyak step-size (Polyak, 1987) commonly
used in the subgradient method. Although computing the Polyak step-size requires …
used in the subgradient method. Although computing the Polyak step-size requires …
Non-convex finite-sum optimization via scsg methods
We develop a class of algorithms, as variants of the stochastically controlled stochastic
gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only …
gradient (SCSG) methods, for the smooth nonconvex finite-sum optimization problem. Only …
Online batch selection for faster training of neural networks
Deep neural networks are commonly trained using stochastic non-convex optimization
procedures, which are driven by gradient information estimated on fractions (batches) of the …
procedures, which are driven by gradient information estimated on fractions (batches) of the …
[KNIHA][B] Optimization for machine learning
An up-to-date account of the interplay between optimization and machine learning,
accessible to students and researchers in both communities. The interplay between …
accessible to students and researchers in both communities. The interplay between …
Stochastic nested variance reduction for nonconvex optimization
We study nonconvex optimization problems, where the objective function is either an
average of n nonconvex functions or the expectation of some stochastic function. We …
average of n nonconvex functions or the expectation of some stochastic function. We …
Stochastic variance-reduced policy gradient
In this paper, we propose a novel reinforcement-learning algorithm consisting in a stochastic
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …
variance-reduced version of policy gradient for solving Markov Decision Processes (MDPs) …
Barzilai-borwein step size for stochastic gradient descent
One of the major issues in stochastic gradient descent (SGD) methods is how to choose an
appropriate step size while running the algorithm. Since the traditional line search technique …
appropriate step size while running the algorithm. Since the traditional line search technique …
Stochastic variance reduction methods for saddle-point problems
We consider convex-concave saddle-point problems where the objective functions may be
split in many components, and extend recent stochastic variance reduction methods (such …
split in many components, and extend recent stochastic variance reduction methods (such …