[HTML][HTML] Landscape and training regimes in deep learning

M Geiger, L Petrini, M Wyart - Physics Reports, 2021 - Elsevier
Deep learning algorithms are responsible for a technological revolution in a variety of tasks
including image recognition or Go playing. Yet, why they work is not understood. Ultimately …

Scaling description of generalization with number of parameters in deep learning

M Geiger, A Jacot, S Spigler, F Gabriel… - Journal of Statistical …, 2020 - iopscience.iop.org
Supervised deep learning involves the training of neural networks with a large number N of
parameters. For large enough N, in the so-called over-parametrized regime, one can …

Jamming transition as a paradigm to understand the loss landscape of deep neural networks

M Geiger, S Spigler, S d'Ascoli, L Sagun, M Baity-Jesi… - Physical Review E, 2019 - APS
Deep learning has been immensely successful at a variety of tasks, ranging from
classification to artificial intelligence. Learning corresponds to fitting training data, which is …

Dynamical mean-field theory for stochastic gradient descent in gaussian mixture classification

F Mignacco, F Krzakala, P Urbani… - Advances in Neural …, 2020 - proceedings.neurips.cc
We analyze in a closed form the learning dynamics of stochastic gradient descent (SGD) for
a single layer neural network classifying a high-dimensional Gaussian mixture where each …

The self-organized criticality paradigm in economics & finance

JP Bouchaud - arxiv preprint arxiv:2407.10284, 2024 - arxiv.org
``Self-Organised Criticality''(SOC) is the mechanism by which complex systems
spontaneously settle close to a* critical point*, at the edge between stability and chaos, and …

Memorizing without overfitting: Bias, variance, and interpolation in overparameterized models

JW Rocks, P Mehta - Physical review research, 2022 - APS
The bias-variance trade-off is a central concept in supervised learning. In classical statistics,
increasing the complexity of a model (eg, number of parameters) reduces bias but also …

The effective noise of stochastic gradient descent

F Mignacco, P Urbani - Journal of Statistical Mechanics: Theory …, 2022 - iopscience.iop.org
Stochastic gradient descent (SGD) is the workhorse algorithm of deep learning technology.
At each step of the training phase, a mini batch of samples is drawn from the training dataset …

The nature of non-phononic excitations in disordered systems

W Schirmacher, M Paoluzzi, FC Mocanu… - Nature …, 2024 - nature.com
The frequency scaling exponent of low-frequency excitations in microscopically small
glasses, which do not allow for the existence of waves (phonons), has been in the focus of …

Gardner formula for Ising perceptron models at small densities

E Bolthausen, S Nakajima, N Sun… - Conference on Learning …, 2022 - proceedings.mlr.press
We consider the Ising perceptron model with N spins and M= N* alpha patterns, with a
general activation function U that is bounded above. For U bounded away from zero, or U a …

Glasses and aging, A statistical mechanics perspective on

F Arceri, FP Landes, L Berthier, G Biroli - Statistical and Nonlinear Physics, 2022 - Springer
We start with a few concise definitions of the most important concepts discussed in this entry.
Glass transition For molecular liquids, the glass transition denotes a crossover from a …