Optimization for deep learning: An overview

RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …

Randomized numerical linear algebra: Foundations and algorithms

PG Martinsson, JA Tropp - Acta Numerica, 2020 - cambridge.org
This survey describes probabilistic algorithms for linear algebraic computations, such as
factorizing matrices and solving linear systems. It focuses on techniques that have a proven …

Optimization methods for large-scale machine learning

L Bottou, FE Curtis, J Nocedal - SIAM review, 2018 - SIAM
This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …

Coordinate descent algorithms

SJ Wright - Mathematical programming, 2015 - Springer
Coordinate descent algorithms solve optimization problems by successively performing
approximate minimization along coordinate directions or coordinate hyperplanes. They have …

Optimization for deep learning: theory and algorithms

R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …

[BUCH][B] First-order and stochastic optimization methods for machine learning

G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …

Linear convergence of first order methods for non-strongly convex optimization

I Necoara, Y Nesterov, F Glineur - Mathematical Programming, 2019 - Springer
The standard assumption for proving linear convergence of first order methods for smooth
convex optimization is the strong convexity of the objective function, an assumption which …

Efficiency of coordinate descent methods on huge-scale optimization problems

Y Nesterov - SIAM Journal on Optimization, 2012 - SIAM
In this paper we propose new methods for solving huge-scale optimization problems. For
problems of this size, even the simplest full-dimensional vector operations are very …

Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function

P Richtárik, M Takáč - Mathematical Programming, 2014 - Springer
In this paper we develop a randomized block-coordinate descent method for minimizing the
sum of a smooth and a simple nonsmooth block-separable convex function and prove that it …

Randomized iterative methods for linear systems

RM Gower, P Richtárik - SIAM Journal on Matrix Analysis and Applications, 2015 - SIAM
We develop a novel, fundamental, and surprisingly simple randomized iterative method for
solving consistent linear systems. Our method has six different but equivalent interpretations …