On the global convergence rates of softmax policy gradient methods

J Mei, C **ao, C Szepesvari… - … on machine learning, 2020 - proceedings.mlr.press
We make three contributions toward better understanding policy gradient methods in the
tabular setting. First, we show that with the true gradient, policy gradient with a softmax …

Fast convergence to non-isolated minima: four equivalent conditions for functions

Q Rebjock, N Boumal - Mathematical Programming, 2024 - Springer
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …

[LIVRE][B] Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives

C Cartis, NIM Gould, PL Toint - 2022 - SIAM
Do you know the difference between an optimist and a pessimist? The former believes we
live in the best possible world, and the latter is afraid that the former might be right.… In that …

Sgd converges to global minimum in deep learning via star-convex path

Y Zhou, J Yang, H Zhang, Y Liang, V Tarokh - arxiv preprint arxiv …, 2019 - arxiv.org
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a
variety of deep neural networks. However, there is still a lack of understanding on how and …

Stochastic second-order methods improve best-known sample complexity of SGD for gradient-dominated functions

S Masiha, S Salehkaleybar, N He… - Advances in …, 2022 - proceedings.neurips.cc
We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of
functions satisfying gradient dominance property with $1\le\alpha\le2 $ which holds in a …

Proximal gradient descent-ascent: Variable convergence under k {\L} geometry

Z Chen, Y Zhou, T Xu, Y Liang - arxiv preprint arxiv:2102.04653, 2021 - arxiv.org
The gradient descent-ascent (GDA) algorithm has been widely applied to solve minimax
optimization problems. In order to achieve convergent policy parameters for minimax …

Stochastic variance-reduced cubic regularization for nonconvex optimization

Z Wang, Y Zhou, Y Liang, G Lan - The 22nd International …, 2019 - proceedings.mlr.press
Cubic regularization (CR) is an optimization method with emerging popularity due to its
capability to escape saddle points and converge to second-order stationary solutions for …

An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization

Z Chen, B Kailkhura, Y Zhou - Machine Learning, 2023 - Springer
Many important machine learning applications involve regularized nonconvex bi-level
optimization. However, the existing gradient-based bi-level optimization algorithms cannot …

Cubic regularization with momentum for nonconvex optimization

Z Wang, Y Zhou, Y Liang, G Lan - Uncertainty in Artificial …, 2020 - proceedings.mlr.press
Momentum is a popular technique to accelerate the convergence in practical training, and its
impact on convergence guarantee has been well-studied for first-order algorithms. However …

Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices

Q Rebjock, N Boumal - Mathematical Programming, 2024 - Springer
Trust-region methods (TR) can converge quadratically to minima where the Hessian is
positive definite. However, if the minima are not isolated, then the Hessian there cannot be …