Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics

E Abbe, EB Adsera… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …

How far can transformers reason? the globality barrier and inductive scratchpad

E Abbe, S Bengio, A Lotfi… - Advances in Neural …, 2025 - proceedings.neurips.cc
Can Transformers predict new syllogisms by composing established ones? More generally,
what type of targets can be learned by such models from scratch? Recent works show that …

Generalization on the unseen, logic reasoning and degree curriculum

E Abbe, S Bengio, A Lotfi, K Rizk - Journal of Machine Learning Research, 2024 - jmlr.org
This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …

Provable guarantees for neural networks via gradient feature learning

Z Shi, J Wei, Y Liang - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …

Towards better out-of-distribution generalization of neural algorithmic reasoning tasks

S Mahdavi, K Swersky, T Kipf, M Hashemi… - ar** of attention mechanisms to a generalized Potts model
R Rende, F Gerace, A Laio, S Goldt - Physical Review Research, 2024 - APS
Transformers are neural networks that revolutionized natural language processing and
machine learning. They process sequences of inputs, like words, using a mechanism called …

Transfer learning beyond bounded density ratios

A Kalavasis, I Zadik, M Zampetakis - arxiv preprint arxiv:2403.11963, 2024 - arxiv.org
We study the fundamental problem of transfer learning where a learning algorithm collects
data from some source distribution $ P $ but needs to perform well with respect to a different …

VarBench: Robust language model benchmarking through dynamic variable perturbation

K Qian, S Wan, C Tang, Y Wang, X Zhang… - arxiv preprint arxiv …, 2024 - arxiv.org
As large language models achieve impressive scores on traditional benchmarks, an
increasing number of researchers are becoming concerned about benchmark data leakage …

[PDF][PDF] Boolformer: Symbolic regression of logic functions with transformers

S d'Ascoli, S Bengio, J Susskind… - arxiv preprint arxiv …, 2023 - bengio.abracadoudou.com
In this work, we introduce Boolformer, the first Transformer architecture trained to perform
endto-end symbolic regression of Boolean functions. First, we show that it can predict …

The Buffer Mechanism for Multi-Step Information Reasoning in Language Models

Z Wang, Y Wang, Z Zhang, Z Zhou, H **, T Hu… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models have consistently struggled with complex reasoning tasks, such as
mathematical problem-solving. Investigating the internal reasoning mechanisms of these …