[HTML][HTML] Landscape and training regimes in deep learning
Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …
Scaling description of generalization with number of parameters in deep learning
Supervised deep learning involves the training of neural networks with a large number N of
parameters. For large enough N, in the so-called over-parametrized regime, one can …
parameters. For large enough N, in the so-called over-parametrized regime, one can …
Jamming transition as a paradigm to understand the loss landscape of deep neural networks
Deep learning has been immensely successful at a variety of tasks, ranging from
classification to artificial intelligence. Learning corresponds to fitting training data, which is …
classification to artificial intelligence. Learning corresponds to fitting training data, which is …
Dynamical mean-field theory for stochastic gradient descent in gaussian mixture classification
We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for
a single layer neural network classifying a high-dimensional Gaussian mixture where each …
a single layer neural network classifying a high-dimensional Gaussian mixture where each …
The self-organized criticality paradigm in economics & finance
JP Bouchaud - arxiv preprint arxiv:2407.10284, 2024 - arxiv.org
``Self-Organised Criticality''(SOC) is the mechanism by which complex systems
spontaneously settle close to a* critical point*, at the edge between stability and chaos, and …
spontaneously settle close to a* critical point*, at the edge between stability and chaos, and …
Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models
The bias-variance trade-off is a central concept in supervised learning. In classical statistics,
increasing the complexity of a model (eg, number of parameters) reduces bias but also …
increasing the complexity of a model (eg, number of parameters) reduces bias but also …
The effective noise of stochastic gradient descent
F Mignacco, P Urbani - Journal of Statistical Mechanics: Theory …, 2022 - iopscience.iop.org
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology.
At each step of the training phase, a mini batch of samples is drawn from the training dataset …
At each step of the training phase, a mini batch of samples is drawn from the training dataset …
The nature of non-phononic excitations in disordered systems
W Schirmacher, M Paoluzzi, FC Mocanu… - Nature …, 2024 - nature.com
The frequency scaling exponent of low-frequency excitations in microscopically small
glasses, which do not allow for the existence of waves (phonons), has been in the focus of …
glasses, which do not allow for the existence of waves (phonons), has been in the focus of …
Gardner formula for Ising perceptron models at small densities
E Bolthausen, S Nakajima, N Sun… - Conference on Learning …, 2022 - proceedings.mlr.press
We consider the Ising perceptron model with N spins and M= N* alpha patterns, with a
general activation function U that is bounded above. For U bounded away from zero, or U a …
general activation function U that is bounded above. For U bounded away from zero, or U a …
Glasses and aging, A statistical mechanics perspective on
We start with a few concise definitions of the most important concepts discussed in this entry.
Glass transition For molecular liquids, the glass transition denotes a crossover from a …
Glass transition For molecular liquids, the glass transition denotes a crossover from a …