Linear convergence of gradient and proximal-gradient methods under the polyak-łojasiewicz condition

H Karimi, J Nutini, M Schmidt - Joint European conference on machine …, 2016 - Springer
In 1963, Polyak proposed a simple condition that is sufficient to show a global linear
convergence rate for gradient descent. This condition is a special case of the Łojasiewicz …

On the convergence of decentralized gradient descent

K Yuan, Q Ling, W Yin - SIAM Journal on Optimization, 2016 - SIAM
Consider the consensus problem of minimizing f(x)=i=1^nf_i(x), where x∈R^p and each f_i
is only known to the individual agent i in a connected network of n agents. To solve this …

Stochastic gradient descent, weighted sampling, and the randomized Kaczmarz algorithm

D Needell, R Ward, N Srebro - Advances in neural …, 2014 - proceedings.neurips.cc
We improve a recent gurantee of Bach and Moulines on the linear convergence of SGD for
smooth and strongly convex objectives, reducing a quadratic dependence on the strong …

Global convergence and variance reduction for a class of nonconvex-nonconcave minimax problems

J Yang, N Kiyavash, N He - Advances in Neural Information …, 2020 - proceedings.neurips.cc
Nonconvex minimax problems appear frequently in emerging machine learning
applications, such as generative adversarial networks and adversarial learning. Simple …

Understanding incremental learning of gradient descent: A fine-grained analysis of matrix sensing

J **, Z Li, K Lyu, SS Du, JD Lee - … Conference on Machine …, 2023 - proceedings.mlr.press
It is believed that Gradient Descent (GD) induces an implicit bias towards good
generalization in training machine learning models. This paper provides a fine-grained …

Convergence rates for the stochastic gradient descent method for non-convex objective functions

B Fehrman, B Gess, A Jentzen - Journal of Machine Learning Research, 2020 - jmlr.org
We prove the convergence to minima and estimates on the rate of convergence for the
stochastic gradient descent method in the case of not necessarily locally convex nor …

The implicit regularization of stochastic gradient flow for least squares

A Ali, E Dobriban, R Tibshirani - International conference on …, 2020 - proceedings.mlr.press
We study the implicit regularization of mini-batch stochastic gradient descent, when applied
to the fundamental problem of least squares regression. We leverage a continuous-time …

On the lower bound of minimizing polyak-łojasiewicz functions

P Yue, C Fang, Z Lin - The Thirty Sixth Annual Conference …, 2023 - proceedings.mlr.press
Abstract Polyak-Łojasiewicz (PL)(Polyak, 1963) condition is a weaker condition than the
strong convexity but suffices to ensure a global convergence for the Gradient Descent …

Sgd for structured nonconvex functions: Learning rates, minibatching and interpolation

R Gower, O Sebbouh, N Loizou - … Conference on Artificial …, 2021 - proceedings.mlr.press
Abstract Stochastic Gradient Descent (SGD) is being used routinely for optimizing non-
convex functions. Yet, the standard convergence theory for SGD in the smooth non-convex …

On exponential convergence of sgd in non-convex over-parametrized learning

R Bassily, M Belkin, S Ma - arxiv preprint arxiv:1811.02564, 2018 - arxiv.org
Large over-parametrized models learned via stochastic gradient descent (SGD) methods
have become a key element in modern machine learning. Although SGD methods are very …