High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Gradient-based feature learning under structured data
Recent works have demonstrated that the sample complexity of gradient-based learning of
single index models, ie functions that depend on a 1-dimensional projection of the input …
single index models, ie functions that depend on a 1-dimensional projection of the input …
Learning threshold neurons via edge of stability
Existing analyses of neural network training often operate under the unrealistic assumption
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
of an extremely small learning rate. This lies in stark contrast to practical wisdom and …
Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
Mean-field langevin dynamics: Exponential convergence and annealing
L Chizat - arxiv preprint arxiv:2202.01009, 2022 - arxiv.org
Noisy particle gradient descent (NPGD) is an algorithm to minimize convex functions over
the space of measures that include an entropy term. In the many-particle limit, this algorithm …
the space of measures that include an entropy term. In the many-particle limit, this algorithm …
Neural networks efficiently learn low-dimensional representations with sgd
We study the problem of training a two-layer neural network (NN) of arbitrary width using
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
stochastic gradient descent (SGD) where the input $\boldsymbol {x}\in\mathbb {R}^ d $ is …
Mean-field langevin dynamics: Time-space discretization, stochastic gradient, and variance reduction
The mean-field Langevin dynamics (MFLD) is a nonlinear generalization of the Langevin
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …
dynamics that incorporates a distribution-dependent drift, and it naturally arises from the …
Two-scale gradient descent ascent dynamics finds mixed nash equilibria of continuous games: A mean-field perspective
Y Lu - International Conference on Machine Learning, 2023 - proceedings.mlr.press
Finding the mixed Nash equilibria (MNE) of a two-player zero sum continuous game is an
important and challenging problem in machine learning. A canonical algorithm to finding the …
important and challenging problem in machine learning. A canonical algorithm to finding the …
Sampling from the mean-field stationary distribution
We study the complexity of sampling from the stationary distribution of a mean-field SDE, or
equivalently, the complexity of minimizing a functional over the space of probability …
equivalently, the complexity of minimizing a functional over the space of probability …
Uniform-in-time propagation of chaos for mean field Langevin dynamics
We study the mean field Langevin dynamics and the associated particle system. By
assuming the functional convexity of the energy, we obtain the $ L^ p $-convergence of the …
assuming the functional convexity of the energy, we obtain the $ L^ p $-convergence of the …