Nonconvex optimization meets low-rank matrix factorization: An overview

Y Chi, YM Lu, Y Chen - IEEE Transactions on Signal …, 2019 - ieeexplore.ieee.org
Substantial progress has been made recently on develo** provably accurate and efficient
algorithms for low-rank matrix factorization via nonconvex optimization. While conventional …

Stochastic gradient descent and its variants in machine learning

P Netrapalli - Journal of the Indian Institute of Science, 2019 - Springer
Stochastic Gradient Descent and Its Variants in Machine Learning | Journal of the Indian
Institute of Science Skip to main content SpringerLink Account Menu Find a journal Publish with …

Spider: Near-optimal non-convex optimization via stochastic path-integrated differential estimator

C Fang, CJ Li, Z Lin, T Zhang - Advances in neural …, 2018 - proceedings.neurips.cc
In this paper, we propose a new technique named\textit {Stochastic Path-Integrated
Differential EstimatoR}(SPIDER), which can be used to track many deterministic quantities of …

On the optimization of deep networks: Implicit acceleration by overparameterization

S Arora, N Cohen, E Hazan - International conference on …, 2018 - proceedings.mlr.press
Conventional wisdom in deep learning states that increasing depth improves
expressiveness but complicates optimization. This paper suggests that, sometimes …

Non-convex optimization for machine learning

P Jain, P Kar - Foundations and Trends® in Machine …, 2017 - nowpublishers.com
A vast majority of machine learning algorithms train their models and perform inference by
solving optimization problems. In order to capture the learning and prediction problems …

Theoretical insights into the optimization landscape of over-parameterized shallow neural networks

M Soltanolkotabi, A Javanmard… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In this paper, we study the problem of learning a shallow artificial neural network that best
fits a training data set. We study this problem in the over-parameterized regime where the …

Adagrad stepsizes: Sharp convergence over nonconvex landscapes

R Ward, X Wu, L Bottou - Journal of Machine Learning Research, 2020 - jmlr.org
Adaptive gradient methods such as AdaGrad and its variants update the stepsize in
stochastic gradient descent on the fly according to the gradients received along the way; …

The complexity of constrained min-max optimization

C Daskalakis, S Skoulakis, M Zampetakis - Proceedings of the 53rd …, 2021 - dl.acm.org
Despite its important applications in Machine Learning, min-max optimization of objective
functions that are nonconvex-nonconcave remains elusive. Not only are there no known first …

Global optimality guarantees for policy gradient methods

J Bhandari, D Russo - Operations Research, 2024 - pubsonline.informs.org
Policy gradients methods apply to complex, poorly understood, control problems by
performing stochastic gradient descent over a parameterized class of polices. Unfortunately …

Accelerated methods for nonconvex optimization

Y Carmon, JC Duchi, O Hinder, A Sidford - SIAM Journal on Optimization, 2018 - SIAM
We present an accelerated gradient method for nonconvex optimization problems with
Lipschitz continuous first and second derivatives. In a time O(ϵ^-7/4\log(1/ϵ)), the method …