Pyhessian: Neural networks through the lens of the hessian
We present PYHESSIAN, a new scalable framework that enables fast computation of
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …
Second-order stochastic optimization for machine learning in linear time
First-order stochastic methods are the state-of-the-art in large-scale machine learning
optimization owing to efficient per-iteration complexity. Second-order methods, while able to …
optimization owing to efficient per-iteration complexity. Second-order methods, while able to …
Newton-type methods for non-convex optimization under inexact Hessian information
We consider variants of trust-region and adaptive cubic regularization methods for non-
convex optimization, in which the Hessian matrix is approximated. Under certain condition …
convex optimization, in which the Hessian matrix is approximated. Under certain condition …
Shampoo: Preconditioned stochastic tensor optimization
Preconditioned gradient methods are among the most general and powerful tools in
optimization. However, preconditioning requires storing and manipulating prohibitively large …
optimization. However, preconditioning requires storing and manipulating prohibitively large …
Exact and inexact subsampled Newton methods for optimization
R Bollapragada, RH Byrd… - IMA Journal of Numerical …, 2019 - academic.oup.com
The paper studies the solution of stochastic optimization problems in which approximations
to the gradient and Hessian are obtained through subsampling. We first consider Newton …
to the gradient and Hessian are obtained through subsampling. We first consider Newton …
Stochastic block BFGS: Squeezing more curvature out of data
R Gower, D Goldfarb… - … Conference on Machine …, 2016 - proceedings.mlr.press
We propose a novel limited-memory stochastic block BFGS update for incorporating
enriched curvature information in stochastic approximation methods. In our method, the …
enriched curvature information in stochastic approximation methods. In our method, the …
Faster differentially private convex optimization via second-order methods
Differentially private (stochastic) gradient descent is the workhorse of DP private machine
learning in both the convex and non-convex settings. Without privacy constraints, second …
learning in both the convex and non-convex settings. Without privacy constraints, second …
An overview of stochastic quasi-Newton methods for large-scale machine learning
TD Guo, Y Liu, CY Han - Journal of the Operations Research Society of …, 2023 - Springer
Numerous intriguing optimization problems arise as a result of the advancement of machine
learning. The stochastic first-order method is the predominant choice for those problems due …
learning. The stochastic first-order method is the predominant choice for those problems due …
Sub-sampled cubic regularization for non-convex optimization
We consider the minimization of non-convex functions that typically arise in machine
learning. Specifically, we focus our attention on a variant of trust region methods known as …
learning. Specifically, we focus our attention on a variant of trust region methods known as …
Sub-sampled Newton methods
F Roosta-Khorasani, MW Mahoney - Mathematical Programming, 2019 - Springer
For large-scale finite-sum minimization problems, we study non-asymptotic and high-
probability global as well as local convergence properties of variants of Newton's method …
probability global as well as local convergence properties of variants of Newton's method …