Sgd learning on neural networks: leap complexity and saddle-to-saddle dynamics
We investigate the time complexity of SGD learning on fully-connected neural networks with
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …
isotropic data. We put forward a complexity measure,{\it the leap}, which measures how …
Generalization on the unseen, logic reasoning and degree curriculum
This paper considers the learning of logical (Boolean) functions with a focus on the
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …
generalization on the unseen (GOTU) setting, a strong case of out-of-distribution …
Provable guarantees for neural networks via gradient feature learning
Neural networks have achieved remarkable empirical performance, while the current
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
theoretical analysis is not adequate for understanding their success, eg, the Neural Tangent …
Towards better out-of-distribution generalization of neural algorithmic reasoning tasks
Boolformer: Symbolic regression of logic functions with transformers
In this work, we introduce Boolformer, the first Transformer architecture trained to perform
end-to-end symbolic regression of Boolean functions. First, we show that it can predict …
end-to-end symbolic regression of Boolean functions. First, we show that it can predict …
How Far Can Transformers Reason? The Locality Barrier and Inductive Scratchpad
Can Transformers predict new syllogisms by composing established ones? More generally,
what type of targets can be learned by such models from scratch? Recent works show that …
what type of targets can be learned by such models from scratch? Recent works show that …
Implicit Bias of Policy Gradient in Linear Quadratic Control: Extrapolation to Unseen Initial States
In modern machine learning, models can often fit training data in numerous ways, some of
which perform well on unseen (test) data, while others do not. Remarkably, in such cases …
which perform well on unseen (test) data, while others do not. Remarkably, in such cases …