Second-order optimization with lazy hessians
We analyze Newton's method with lazy Hessian updates for solving general possibly non-
convex optimization problems. We propose to reuse a previously seen Hessian for several …
convex optimization problems. We propose to reuse a previously seen Hessian for several …
Sania: Polyak-type optimization framework leads to scale invariant stochastic algorithms
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …
Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness
We present a new accelerated stochastic second-order method that is robust to both
gradient and Hessian inexactness, which occurs typically in machine learning. We establish …
gradient and Hessian inexactness, which occurs typically in machine learning. We establish …
[PDF][PDF] Accelerated adaptive cubic regularized quasi-newton methods
In this paper, we propose Cubic Regularized Quasi-Newton Methods for (strongly)
starconvex and Accelerated Cubic Regularized Quasi-Newton for convex optimization. The …
starconvex and Accelerated Cubic Regularized Quasi-Newton for convex optimization. The …
Minimizing quasi-self-concordant functions by gradient regularization of Newton method
N Doikov - arxiv preprint arxiv:2308.14742, 2023 - arxiv.org
We study the composite convex optimization problems with a Quasi-Self-Concordant smooth
component. This problem class naturally interpolates between classic Self-Concordant …
component. This problem class naturally interpolates between classic Self-Concordant …
OPTAMI: Global Superlinear Convergence of High-order Methods
Second-order methods for convex optimization outperform first-order methods in terms of
theoretical iteration convergence, achieving rates up to $ O (k^{-5}) $ for highly-smooth …
theoretical iteration convergence, achieving rates up to $ O (k^{-5}) $ for highly-smooth …
Sketch-and-project meets Newton method: Global O (k− 2) convergence with low-rank updates
S Hanzely - 2023 - repository.kaust.edu.sa
In this paper, we propose the first sketch-and-project Newton method with fast O (k− 2)
global convergence rate for self-concordant functions. Our method, SGN, can be viewed in …
global convergence rate for self-concordant functions. Our method, SGN, can be viewed in …
Adaptive Optimization Algorithms for Machine Learning
S Hanzely - arxiv preprint arxiv:2311.10203, 2023 - arxiv.org
Machine learning assumes a pivotal role in our data-driven world. The increasing scale of
models and datasets necessitates quick and reliable algorithms for model training. This …
models and datasets necessitates quick and reliable algorithms for model training. This …
Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions
The performance of optimization methods is often tied to the spectrum of the objective
Hessian. Yet, conventional assumptions, such as smoothness, do often not enable us to …
Hessian. Yet, conventional assumptions, such as smoothness, do often not enable us to …
Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions
Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps
decrease the local condition number of a function by appropriately transforming its gradient …
decrease the local condition number of a function by appropriately transforming its gradient …