Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization
Recent studies have shown that many nonconvex machine learning problems meet a so-
called generalized-smooth condition that extends beyond traditional smooth nonconvex …
called generalized-smooth condition that extends beyond traditional smooth nonconvex …
SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning
Cross-device training is a crucial subfield of federated learning, where the number of clients
can reach into the billions. Standard approaches and local methods are prone to issues …
can reach into the billions. Standard approaches and local methods are prone to issues …
Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes
We study gradient descent (GD) dynamics on logistic regression problems with large,
constant step sizes. For linearly-separable data, it is known that GD converges to the …
constant step sizes. For linearly-separable data, it is known that GD converges to the …
Mirror Descent Under Generalized Smoothness
Smoothness is crucial for attaining fast rates in first-order optimization. However, many
optimization problems in modern machine learning involve non-smooth objectives. Recent …
optimization problems in modern machine learning involve non-smooth objectives. Recent …
Glocal Smoothness: Line Search can really help!
Iteration complexities are bounds on the number of iterations of an algorithm. Iteration
complexities for first-order numerical optimization algorithms are typically stated in terms of a …
complexities for first-order numerical optimization algorithms are typically stated in terms of a …