Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

Generalization on the unseen, logic reasoning and degree curriculum

E Abbe, S Bengio, A Lotfi, K Rizk - Journal of Machine Learning Research, 2024 - jmlr.org
This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Towards better out-of-distribution generalization of neural algorithmic reasoning tasks

S Mahdavi, K Swersky, T Kipf, M Hashemi… - ar** of attention mechanisms to a generalized Potts model
R Rende, F Gerace, A Laio, S Goldt - Physical Review Research, 2024 - APS
Transformers are neural networks that revolutionized natural language processing and
machine learning. They process sequences of inputs, like words, using a mechanism called …

Boolformer: Symbolic regression of logic functions with transformers

S d'Ascoli, S Bengio, J Susskind, E Abbé - arxiv preprint arxiv:2309.12207, 2023 - arxiv.org
In this work, we introduce Boolformer, the first Transformer architecture trained to perform
end-to-end symbolic regression of Boolean functions. First, we show that it can predict …

How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad

E Abbe, S Bengio, A Lotfi, C Sandon… - arxiv preprint arxiv …, 2024 - arxiv.org
Can Transformers predict new syllogisms by composing established ones? More generally,
what type of targets can be learned by such models from scratch? Recent works show that …

Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States

N Razin, Y Alexander, E Cohen-Karlik, R Giryes… - arxiv preprint arxiv …, 2024 - arxiv.org
In modern machine learning, models can often fit training data in numerous ways, some of
which perform well on unseen (test) data, while others do not. Remarkably, in such cases …