Nonconvex optimization meets low-rank matrix factorization: An overview
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …
Stochastic gradient descent and its variants in machine learning
P Netrapalli - Journal of the Indian Institute of Science, 2019 - Springer
Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …
Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …
On the optimization of deep networks: Implicit acceleration by overparameterization
Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that, sometimes …
expressiveness but complicates optimization. This paper suggests that, sometimes …
Non-convex optimization for machine learning
P Jain, P Kar - Foundations and Trends® in Machine …, 2017 - nowpublishers.com
A vast majority of machine learning algorithms train their models and perform inference by
solving optimization problems. In order to capture the learning and prediction problems …
solving optimization problems. In order to capture the learning and prediction problems …
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
M Soltanolkotabi, A Javanmard… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In this paper, we study the problem of learning a shallow artificial neural network that best
fits a training data set. We study this problem in the over-parameterized regime where the …
fits a training data set. We study this problem in the over-parameterized regime where the …
Adagrad stepsizes: Sharp convergence over nonconvex landscapes
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …
stochastic gradient descent on the fly according to the gradients received along the way; …
The complexity of constrained min-max optimization
Despite its important applications in Machine Learning, min-max optimization of objective
functions that are nonconvex-nonconcave remains elusive. Not only are there no known first …
functions that are nonconvex-nonconcave remains elusive. Not only are there no known first …
Global optimality guarantees for policy gradient methods
J Bhandari, D Russo - Operations Research, 2024 - pubsonline.informs.org
Policy gradients methods apply to complex, poorly understood, control problems by
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …
Accelerated methods for nonconvex optimization
We present an accelerated gradient method for nonconvex optimization problems with
Lipschitz continuous first and second derivatives. In a time O(ϵ^-7/4\log(1/ϵ)), the method …
Lipschitz continuous first and second derivatives. In a time O(ϵ^-7/4\log(1/ϵ)), the method …