High-dimensional limit theorems for sgd: Effective dynamics and critical scaling

G Ben Arous, R Gheissari… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the scaling limits of stochastic gradient descent (SGD) with constant step-size in
the high-dimensional regime. We prove limit theorems for the trajectories of summary …

Online stochastic gradient descent on non-convex losses from high-dimensional inference

GB Arous, R Gheissari, A Jagannath - Journal of Machine Learning …, 2021 - jmlr.org
Stochastic gradient descent (SGD) is a popular algorithm for optimization problems arising
in high-dimensional inference tasks. Here one produces an estimator of an unknown …

The benefits of reusing batches for gradient descent in two-layer networks: Breaking the curse of information and leap exponents

Y Dandi, E Troiani, L Arnaboldi, L Pesce… - arxiv preprint arxiv …, 2024 - arxiv.org
We investigate the training dynamics of two-layer neural networks when learning multi-index
target functions. We focus on multi-pass gradient descent (GD) that reuses the batches …

On the impact of overparameterization on the training of a shallow neural network in high dimensions

S Martin, F Bach, G Biroli - International Conference on …, 2024 - proceedings.mlr.press
We study the training dynamics of a shallow neural network with quadratic activation
functions and quadratic cost in a teacher-student setup. In line with previous works on the …

Sudakov–Fernique post-AMP, and a new proof of the local convexity of the TAP free energy

M Celentano - The Annals of Probability, 2024 - projecteuclid.org
We develop an approach for studying the local convexity of a certain class of random
objectives around the iterates of an AMP algorithm. Our approach involves applying the …

Statistical limits of dictionary learning: random matrix theory and the spectral replica method

J Barbier, N Macris - Physical Review E, 2022 - APS
We consider increasingly complex models of matrix denoising and dictionary learning in the
Bayes-optimal setting, in the challenging regime where the matrices to infer have a rank …

Rethinking Mean-Field Glassy Dynamics and Its Relation with the Energy Landscape: The Surprising Case of the Spherical Mixed -Spin Model

G Folena, S Franz, F Ricci-Tersenghi - Physical Review X, 2020 - APS
The spherical p-spin model is a fundamental model in statistical mechanics of a disordered
system with a random first-order transition. The dynamics of this model is interesting both for …

Quantitative propagation of chaos for SGD in wide neural networks

V De Bortoli, A Durmus, X Fontaine… - Advances in Neural …, 2020 - proceedings.neurips.cc
In this paper, we investigate the limiting behavior of a continuous-time counterpart of the
Stochastic Gradient Descent (SGD) algorithm applied to two-layer overparameterized neural …

Landscape complexity for the empirical risk of generalized linear models

A Maillard, GB Arous, G Biroli - Mathematical and Scientific …, 2020 - proceedings.mlr.press
We present a method to obtain the average and the typical value of the number of critical
points of the empirical risk landscape for generalized linear estimation problems and …

Random tensor theory for tensor decomposition

M Ouerfelli, M Tamaazousti, V Rivasseau - Proceedings of the AAAI …, 2022 - ojs.aaai.org
We propose a new framework for tensor decomposition based on trace invariants, which are
particular cases of tensor networks. In general, tensor networks are diagrams/graphs that …