Fast convergence to non-isolated minima: four equivalent conditions for functions

Q Rebjock, N Boumal - Mathematical Programming, 2024 - Springer
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …

[PDF][PDF] Smoothing the edges: A general framework for smooth optimization in sparse regularization using Hadamard overparametrization

C Kolb, CL Müller, B Bischl… - arxiv preprint arxiv …, 2023 - researchgate.net
This paper presents a framework for smooth optimization of objectives with ℓq and ℓp, q
regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly …

Algorithmic regularization in tensor optimization: towards a lifted approach in matrix sensing

Z Ma, J Lavaei, S Sojoudi - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces
implicit regularization, promoting compact representations. In this work, we examine the role …

Benign nonconvex landscapes in optimal and robust control, Part I: Global optimality

Y Zheng, C Pai, Y Tang - arxiv preprint arxiv:2312.15332, 2023 - arxiv.org
Direct policy search has achieved great empirical success in reinforcement learning. Many
recent studies have revisited its theoretical foundation for continuous control, which reveals …

Over-parametrization via lifting for low-rank matrix sensing: Conversion of spurious solutions to strict saddle points

Z Ma, I Molybog, J Lavaei… - … Conference on Machine …, 2023 - proceedings.mlr.press
This paper studies the role of over-parametrization in solving non-convex optimization
problems. The focus is on the important class of low-rank matrix sensing, where we propose …

Continuation path learning for homotopy optimization

X Lin, Z Yang, X Zhang… - … Conference on Machine …, 2023 - proceedings.mlr.press
Homotopy optimization is a traditional method to deal with a complicated optimization
problem by solving a sequence of easy-to-hard surrogate subproblems. However, this …

Geometry and optimization of shallow polynomial networks

Y Arjevani, J Bruna, J Kileel, E Polak… - arxiv preprint arxiv …, 2025 - arxiv.org
We study shallow neural networks with polynomial activations. The function space for these
models can be identified with a set of symmetric tensors with bounded rank. We describe …

An apocalypse-free first-order low-rank optimization algorithm with at most one rank reduction attempt per iteration

G Olikier, PA Absil - arxiv preprint arxiv:2208.12051, 2022 - arxiv.org
We consider the problem of minimizing a differentiable function with locally Lipschitz
continuous gradient over the real determinantal variety, and present a first-order algorithm …

From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization

Q Li, D McKenzie, W Yin - … and Inference: A Journal of the IMA, 2023 - academic.oup.com
The standard simplex in, also known as the probability simplex, is the set of nonnegative
vectors whose entries sum up to 1. It frequently appears as a constraint in optimization …

First-order optimization on stratified sets

G Olikier, KA Gallivan, PA Absil - arxiv preprint arxiv:2303.16040, 2023 - arxiv.org
We consider the problem of minimizing a differentiable function with locally Lipschitz
continuous gradient on a stratified set and present a first-order algorithm designed to find a …