Optimization for deep learning: An overview
RY Sun - Journal of the Operations Research Society of China, 2020 - Springer
Optimization is a critical component in deep learning. We think optimization for neural
networks is an interesting topic for theoretical research due to various reasons. First, its …
networks is an interesting topic for theoretical research due to various reasons. First, its …
Randomized numerical linear algebra: Foundations and algorithms
This survey describes probabilistic algorithms for linear algebraic computations, such as
factorizing matrices and solving linear systems. It focuses on techniques that have a proven …
factorizing matrices and solving linear systems. It focuses on techniques that have a proven …
Optimization methods for large-scale machine learning
This paper provides a review and commentary on the past, present, and future of numerical
optimization algorithms in the context of machine learning applications. Through case …
optimization algorithms in the context of machine learning applications. Through case …
Coordinate descent algorithms
SJ Wright - Mathematical programming, 2015 - Springer
Coordinate descent algorithms solve optimization problems by successively performing
approximate minimization along coordinate directions or coordinate hyperplanes. They have …
approximate minimization along coordinate directions or coordinate hyperplanes. They have …
Optimization for deep learning: theory and algorithms
R Sun - arxiv preprint arxiv:1912.08957, 2019 - arxiv.org
When and why can a neural network be successfully trained? This article provides an
overview of optimization algorithms and theory for training neural networks. First, we discuss …
overview of optimization algorithms and theory for training neural networks. First, we discuss …
[BUCH][B] First-order and stochastic optimization methods for machine learning
G Lan - 2020 - Springer
Since its beginning, optimization has played a vital role in data science. The analysis and
solution methods for many statistical and machine learning models rely on optimization. The …
solution methods for many statistical and machine learning models rely on optimization. The …
Linear convergence of first order methods for non-strongly convex optimization
The standard assumption for proving linear convergence of first order methods for smooth
convex optimization is the strong convexity of the objective function, an assumption which …
convex optimization is the strong convexity of the objective function, an assumption which …
Efficiency of coordinate descent methods on huge-scale optimization problems
Y Nesterov - SIAM Journal on Optimization, 2012 - SIAM
In this paper we propose new methods for solving huge-scale optimization problems. For
problems of this size, even the simplest full-dimensional vector operations are very …
problems of this size, even the simplest full-dimensional vector operations are very …
Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function
In this paper we develop a randomized block-coordinate descent method for minimizing the
sum of a smooth and a simple nonsmooth block-separable convex function and prove that it …
sum of a smooth and a simple nonsmooth block-separable convex function and prove that it …
Randomized iterative methods for linear systems
We develop a novel, fundamental, and surprisingly simple randomized iterative method for
solving consistent linear systems. Our method has six different but equivalent interpretations …
solving consistent linear systems. Our method has six different but equivalent interpretations …