On the global convergence rates of softmax policy gradient methods
We make three contributions toward better understanding policy gradient methods in the
tabular setting. First, we show that with the true gradient, policy gradient with a softmax …
tabular setting. First, we show that with the true gradient, policy gradient with a softmax …
Fast convergence to non-isolated minima: four equivalent conditions for functions
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …
at the optimum is singular. These singularities are inescapable when the optima are non …
[LIVRE][B] Evaluation Complexity of Algorithms for Nonconvex Optimization: Theory, Computation and Perspectives
Do you know the difference between an optimist and a pessimist? The former believes we
live in the best possible world, and the latter is afraid that the former might be right.… In that …
live in the best possible world, and the latter is afraid that the former might be right.… In that …
Sgd converges to global minimum in deep learning via star-convex path
Stochastic gradient descent (SGD) has been found to be surprisingly effective in training a
variety of deep neural networks. However, there is still a lack of understanding on how and …
variety of deep neural networks. However, there is still a lack of understanding on how and …
Stochastic second-order methods improve best-known sample complexity of SGD for gradient-dominated functions
We study the performance of Stochastic Cubic Regularized Newton (SCRN) on a class of
functions satisfying gradient dominance property with $1\le\alpha\le2 $ which holds in a …
functions satisfying gradient dominance property with $1\le\alpha\le2 $ which holds in a …
Proximal gradient descent-ascent: Variable convergence under k {\L} geometry
The gradient descent-ascent (GDA) algorithm has been widely applied to solve minimax
optimization problems. In order to achieve convergent policy parameters for minimax …
optimization problems. In order to achieve convergent policy parameters for minimax …
Stochastic variance-reduced cubic regularization for nonconvex optimization
Cubic regularization (CR) is an optimization method with emerging popularity due to its
capability to escape saddle points and converge to second-order stationary solutions for …
capability to escape saddle points and converge to second-order stationary solutions for …
An accelerated proximal algorithm for regularized nonconvex and nonsmooth bi-level optimization
Many important machine learning applications involve regularized nonconvex bi-level
optimization. However, the existing gradient-based bi-level optimization algorithms cannot …
optimization. However, the existing gradient-based bi-level optimization algorithms cannot …
Cubic regularization with momentum for nonconvex optimization
Momentum is a popular technique to accelerate the convergence in practical training, and its
impact on convergence guarantee has been well-studied for first-order algorithms. However …
impact on convergence guarantee has been well-studied for first-order algorithms. However …
Fast convergence of trust-regions for non-isolated minima via analysis of CG on indefinite matrices
Trust-region methods (TR) can converge quadratically to minima where the Hessian is
positive definite. However, if the minima are not isolated, then the Hessian there cannot be …
positive definite. However, if the minima are not isolated, then the Hessian there cannot be …