Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
Implicit bias of gradient descent for wide two-layer neural networks trained with the logistic loss
Neural networks trained to minimize the logistic (aka cross-entropy) loss with gradient-based
methods are observed to perform well in many supervised classification tasks. Towards …
methods are observed to perform well in many supervised classification tasks. Towards …
On the global convergence of gradient descent for over-parameterized models using optimal transport
Many tasks in machine learning and signal processing can be solved by minimizing a
convex function of a measure. This includes sparse spikes deconvolution or training a …
convex function of a measure. This includes sparse spikes deconvolution or training a …
Mean-field theory of two-layers neural networks: dimension-free bounds and kernel limit
We consider learning two layer neural networks using stochastic gradient descent. The
mean-field description of this learning dynamics approximates the evolution of the network …
mean-field description of this learning dynamics approximates the evolution of the network …
Mean-field langevin dynamics: Exponential convergence and annealing
L Chizat - arxiv preprint arxiv:2202.01009, 2022 - arxiv.org
Noisy particle gradient descent (NPGD) is an algorithm to minimize convex functions over
the space of measures that include an entropy term. In the many-particle limit, this algorithm …
the space of measures that include an entropy term. In the many-particle limit, this algorithm …
Gradient descent on infinitely wide neural networks: Global convergence and generalization
Many supervised machine learning methods are naturally cast as optimization problems. For
prediction models which are linear in their parameters, this often leads to convex problems …
prediction models which are linear in their parameters, this often leads to convex problems …
Convex analysis of the mean field langevin dynamics
As an example of the nonlinear Fokker-Planck equation, the mean field Langevin dynamics
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …
recently attracts attention due to its connection to (noisy) gradient descent on infinitely wide …
Sparse optimization on measures with over-parameterized gradient descent
L Chizat - Mathematical Programming, 2022 - Springer
Minimizing a convex function of a measure with a sparsity-inducing penalty is a typical
problem arising, eg, in sparse spikes deconvolution or two-layer neural networks training …
problem arising, eg, in sparse spikes deconvolution or two-layer neural networks training …
Feature learning via mean-field langevin dynamics: classifying sparse parities and beyond
Neural network in the mean-field regime is known to be capable of\textit {feature learning},
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …
unlike the kernel (NTK) counterpart. Recent works have shown that mean-field neural …