Independently-Normalized SGD for Generalized-Smooth Nonconvex Optimization

Y Yang, E Tripp, Y Sun, S Zou, Y Zhou - arxiv preprint arxiv:2410.14054, 2024 - arxiv.org
Recent studies have shown that many nonconvex machine learning problems meet a so-
called generalized-smooth condition that extends beyond traditional smooth nonconvex …

SPAM: Stochastic Proximal Point Method with Momentum Variance Reduction for Non-convex Cross-Device Federated Learning

A Karagulyan, E Shulgin, A Sadiev… - arxiv preprint arxiv …, 2024 - arxiv.org
Cross-device training is a crucial subfield of federated learning, where the number of clients
can reach into the billions. Standard approaches and local methods are prone to issues …

Gradient Descent on Logistic Regression with Non-Separable Data and Large Step Sizes

SY Meng, A Orvieto, DY Cao, C De Sa - arxiv preprint arxiv:2406.05033, 2024 - arxiv.org
We study gradient descent (GD) dynamics on logistic regression problems with large,
constant step sizes. For linearly-separable data, it is known that GD converges to the …

Mirror Descent Under Generalized Smoothness

D Yu, W Jiang, Y Wan, L Zhang - arxiv preprint arxiv:2502.00753, 2025 - arxiv.org
Smoothness is crucial for attaining fast rates in first-order optimization. However, many
optimization problems in modern machine learning involve non-smooth objectives. Recent …

Glocal Smoothness: Line Search can really help!

C Fox, M Schmidt - OPT 2024: Optimization for Machine Learning - openreview.net
Iteration complexities are bounds on the number of iterations of an algorithm. Iteration
complexities for first-order numerical optimization algorithms are typically stated in terms of a …