High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Gradient-based feature learning under structured data

A Mousavi-Hosseini, D Wu, T Suzuki… - Advances in Neural …, 2023 - proceedings.neurips.cc
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …

Learning threshold neurons via edge of stability

K Ahn, S Bubeck, S Chewi, YT Lee… - Advances in Neural …, 2024 - proceedings.neurips.cc
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …

Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond

T Suzuki, D Wu, K Oko… - Advances in Neural …, 2024 - proceedings.neurips.cc
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …

Mean-field langevin dynamics: Exponential convergence and annealing

L Chizat - arxiv preprint arxiv:2202.01009, 2022 - arxiv.org
Noisy particle gradient descent (NPGD) is an algorithm to minimize convex functions over
the space of measures that include an entropy term. In the many-particle limit, this algorithm …

Neural networks efficiently learn low-dimensional representations with sgd

A Mousavi-Hosseini, S Park, M Girotti… - arxiv preprint arxiv …, 2022 - arxiv.org
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …

Mean-field langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction

T Suzuki, D Wu, A Nitanda - NeurIPS, 2023 - openreview.net
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …

Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective

Y Lu - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an
important and challenging problem in machine learning. A canonical algorithm to finding the …

Sampling from the mean-field stationary distribution

Y Kook, MS Zhang, S Chewi… - The Thirty Seventh …, 2024 - proceedings.mlr.press
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or
equivalently, the complexity of minimizing a functional over the space of probability …

Uniform-in-time propagation of chaos for mean field Langevin dynamics

F Chen, Z Ren, S Wang - arxiv preprint arxiv:2212.03050, 2022 - arxiv.org
We study the mean field Langevin dynamics and the associated particle system. By
assuming the functional convexity of the energy, we obtain the $ L^ p $-convergence of the …