Second-order optimization with lazy hessians

N Doikov, M Jaggi - International Conference on Machine …, 2023 - proceedings.mlr.press
We analyze Newton's method with lazy Hessian updates for solving general possibly non-
convex optimization problems. We propose to reuse a previously seen Hessian for several …

Sania: Polyak-type optimization framework leads to scale invariant stochastic algorithms

F Abdukhakimov, C **ang, D Kamzolov… - arxiv preprint arxiv …, 2023 - arxiv.org
Adaptive optimization methods are widely recognized as among the most popular
approaches for training Deep Neural Networks (DNNs). Techniques such as Adam …

Advancing the lower bounds: An accelerated, stochastic, second-order method with optimal adaptation to inexactness

A Agafonov, D Kamzolov, A Gasnikov, A Kavis… - arxiv preprint arxiv …, 2023 - arxiv.org
We present a new accelerated stochastic second-order method that is robust to both
gradient and Hessian inexactness, which occurs typically in machine learning. We establish …

[PDF][PDF] Accelerated adaptive cubic regularized quasi-newton methods

D Kamzolov, K Ziu, A Agafonov… - arxiv preprint arxiv …, 2023 - researchgate.net
In this paper, we propose Cubic Regularized Quasi-Newton Methods for (strongly)
starconvex and Accelerated Cubic Regularized Quasi-Newton for convex optimization. The …

Minimizing quasi-self-concordant functions by gradient regularization of Newton method

N Doikov - arxiv preprint arxiv:2308.14742, 2023 - arxiv.org
We study the composite convex optimization problems with a Quasi-Self-Concordant smooth
component. This problem class naturally interpolates between classic Self-Concordant …

OPTAMI: Global Superlinear Convergence of High-order Methods

D Kamzolov, D Pasechnyuk, A Agafonov… - arxiv preprint arxiv …, 2024 - arxiv.org
Second-order methods for convex optimization outperform first-order methods in terms of
theoretical iteration convergence, achieving rates up to $ O (k^{-5}) $ for highly-smooth …

Sketch-and-project meets Newton method: Global O (k− 2) convergence with low-rank updates

S Hanzely - 2023 - repository.kaust.edu.sa
In this paper, we propose the first sketch-and-project Newton method with fast O (k− 2)
global convergence rate for self-concordant functions. Our method, SGN, can be viewed in …

Adaptive Optimization Algorithms for Machine Learning

S Hanzely - arxiv preprint arxiv:2311.10203, 2023 - arxiv.org
Machine learning assumes a pivotal role in our data-driven world. The increasing scale of
models and datasets necessitates quick and reliable algorithms for model training. This …

Spectral Preconditioning for Gradient Methods on Graded Non-convex Functions

N Doikov, SU Stich, M Jaggi - arxiv preprint arxiv:2402.04843, 2024 - arxiv.org
The performance of optimization methods is often tied to the spectrum of the objective
Hessian. Yet, conventional assumptions, such as smoothness, do often not enable us to …

Convergence analysis of stochastic gradient descent with adaptive preconditioning for non-convex and convex functions

DA Pasechnyuk, A Gasnikov, M Takáč - arxiv preprint arxiv:2308.14192, 2023 - arxiv.org
Preconditioning is a crucial operation in gradient-based numerical optimisation. It helps
decrease the local condition number of a function by appropriately transforming its gradient …