Fast convergence to non-isolated minima: four equivalent conditions for functions
Optimization algorithms can see their local convergence rates deteriorate when the Hessian
at the optimum is singular. These singularities are inescapable when the optima are non …
at the optimum is singular. These singularities are inescapable when the optima are non …
[PDF][PDF] Smoothing the edges: A general framework for smooth optimization in sparse regularization using Hadamard overparametrization
This paper presents a framework for smooth optimization of objectives with ℓq and ℓp, q
regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly …
regularization for (structured) sparsity. Finding solutions to these non-smooth and possibly …
Algorithmic regularization in tensor optimization: towards a lifted approach in matrix sensing
Gradient descent (GD) is crucial for generalization in machine learning models, as it induces
implicit regularization, promoting compact representations. In this work, we examine the role …
implicit regularization, promoting compact representations. In this work, we examine the role …
Benign nonconvex landscapes in optimal and robust control, Part I: Global optimality
Direct policy search has achieved great empirical success in reinforcement learning. Many
recent studies have revisited its theoretical foundation for continuous control, which reveals …
recent studies have revisited its theoretical foundation for continuous control, which reveals …
Over-parametrization via lifting for low-rank matrix sensing: Conversion of spurious solutions to strict saddle points
This paper studies the role of over-parametrization in solving non-convex optimization
problems. The focus is on the important class of low-rank matrix sensing, where we propose …
problems. The focus is on the important class of low-rank matrix sensing, where we propose …
Continuation path learning for homotopy optimization
Homotopy optimization is a traditional method to deal with a complicated optimization
problem by solving a sequence of easy-to-hard surrogate subproblems. However, this …
problem by solving a sequence of easy-to-hard surrogate subproblems. However, this …
Geometry and optimization of shallow polynomial networks
We study shallow neural networks with polynomial activations. The function space for these
models can be identified with a set of symmetric tensors with bounded rank. We describe …
models can be identified with a set of symmetric tensors with bounded rank. We describe …
An apocalypse-free first-order low-rank optimization algorithm with at most one rank reduction attempt per iteration
We consider the problem of minimizing a differentiable function with locally Lipschitz
continuous gradient over the real determinantal variety, and present a first-order algorithm …
continuous gradient over the real determinantal variety, and present a first-order algorithm …
From the simplex to the sphere: faster constrained optimization using the Hadamard parametrization
The standard simplex in, also known as the probability simplex, is the set of nonnegative
vectors whose entries sum up to 1. It frequently appears as a constraint in optimization …
vectors whose entries sum up to 1. It frequently appears as a constraint in optimization …
First-order optimization on stratified sets
We consider the problem of minimizing a differentiable function with locally Lipschitz
continuous gradient on a stratified set and present a first-order algorithm designed to find a …
continuous gradient on a stratified set and present a first-order algorithm designed to find a …