Pyhessian: Neural networks through the lens of the hessian

Z Yao, A Gholami, K Keutzer… - 2020 IEEE international …, 2020 - ieeexplore.ieee.org
We present PYHESSIAN, a new scalable framework that enables fast computation of
Hessian (ie, second-order derivative) information for deep neural networks. PYHESSIAN …

Second-order stochastic optimization for machine learning in linear time

N Agarwal, B Bullins, E Hazan - Journal of Machine Learning Research, 2017 - jmlr.org
First-order stochastic methods are the state-of-the-art in large-scale machine learning
optimization owing to efficient per-iteration complexity. Second-order methods, while able to …

Newton-type methods for non-convex optimization under inexact Hessian information

P Xu, F Roosta, MW Mahoney - Mathematical Programming, 2020 - Springer
We consider variants of trust-region and adaptive cubic regularization methods for non-
convex optimization, in which the Hessian matrix is approximated. Under certain condition …

Shampoo: Preconditioned stochastic tensor optimization

V Gupta, T Koren, Y Singer - International Conference on …, 2018 - proceedings.mlr.press
Preconditioned gradient methods are among the most general and powerful tools in
optimization. However, preconditioning requires storing and manipulating prohibitively large …

Exact and inexact subsampled Newton methods for optimization

R Bollapragada, RH Byrd… - IMA Journal of Numerical …, 2019 - academic.oup.com
The paper studies the solution of stochastic optimization problems in which approximations
to the gradient and Hessian are obtained through subsampling. We first consider Newton …

Stochastic block BFGS: Squeezing more curvature out of data

R Gower, D Goldfarb… - … Conference on Machine …, 2016 - proceedings.mlr.press
We propose a novel limited-memory stochastic block BFGS update for incorporating
enriched curvature information in stochastic approximation methods. In our method, the …

Faster differentially private convex optimization via second-order methods

A Ganesh, M Haghifam, T Steinke… - Advances in Neural …, 2024 - proceedings.neurips.cc
Differentially private (stochastic) gradient descent is the workhorse of DP private machine
learning in both the convex and non-convex settings. Without privacy constraints, second …

An overview of stochastic quasi-Newton methods for large-scale machine learning

TD Guo, Y Liu, CY Han - Journal of the Operations Research Society of …, 2023 - Springer
Numerous intriguing optimization problems arise as a result of the advancement of machine
learning. The stochastic first-order method is the predominant choice for those problems due …

Sub-sampled cubic regularization for non-convex optimization

JM Kohler, A Lucchi - International Conference on Machine …, 2017 - proceedings.mlr.press
We consider the minimization of non-convex functions that typically arise in machine
learning. Specifically, we focus our attention on a variant of trust region methods known as …

Sub-sampled Newton methods

F Roosta-Khorasani, MW Mahoney - Mathematical Programming, 2019 - Springer
For large-scale finite-sum minimization problems, we study non-asymptotic and high-
probability global as well as local convergence properties of variants of Newton's method …