Representational strengths and limitations of transformers
Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …
learning, yet there is no mathematical description of their benefits and deficiencies as …
Hardness of noise-free learning for two-hidden-layer neural networks
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …
Improved bounds on neural complexity for representing piecewise linear functions
A deep neural network using rectified linear units represents a continuous piecewise linear
(CPWL) function and vice versa. Recent results in the literature estimated that the number of …
(CPWL) function and vice versa. Recent results in the literature estimated that the number of …
Towards lower bounds on the depth of ReLU neural networks
We contribute to a better understanding of the class of functions that is represented by a
neural network with ReLU activations and a given architecture. Using techniques from mixed …
neural network with ReLU activations and a given architecture. Using techniques from mixed …
Optimization-based separations for neural networks
Depth separation results propose a possible theoretical explanation for the benefits of deep
neural networks over shallower architectures, establishing that the former possess superior …
neural networks over shallower architectures, establishing that the former possess superior …
Width is less important than depth in relu neural networks
We solve an open question from Lu et al.(2017), by showing that any target network with
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …
On the optimal memorization power of relu neural networks
We study the memorization power of feedforward ReLU neural networks. We show that such
networks can memorize any $ N $ points that satisfy a mild separability assumption using …
networks can memorize any $ N $ points that satisfy a mild separability assumption using …
The connection between approximation, depth separation and learnability in neural networks
Several recent works have shown separation results between deep neural networks, and
hypothesis classes with inferior approximation capacity such as shallow networks or kernel …
hypothesis classes with inferior approximation capacity such as shallow networks or kernel …
Exponential separations in symmetric neural networks
In this work we demonstrate a novel separation between symmetric neural network
architectures. Specifically, we consider the Relational Network~\parencite …
architectures. Specifically, we consider the Relational Network~\parencite …
Size and depth of monotone neural networks: interpolation and approximation
Monotone functions and data sets arise in a variety of applications. We study the
interpolation problem for monotone data sets: The input is a monotone data set with $ n …
interpolation problem for monotone data sets: The input is a monotone data set with $ n …