Representational strengths and limitations of transformers

C Sanford, DJ Hsu, M Telgarsky - Advances in Neural …, 2024 - proceedings.neurips.cc
Attention layers, as commonly used in transformers, form the backbone of modern deep
learning, yet there is no mathematical description of their benefits and deficiencies as …

Hardness of noise-free learning for two-hidden-layer neural networks

S Chen, A Gollakota, A Klivans… - Advances in Neural …, 2022 - proceedings.neurips.cc
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …

Improved bounds on neural complexity for representing piecewise linear functions

KL Chen, H Garudadri, BD Rao - Advances in Neural …, 2022 - proceedings.neurips.cc
A deep neural network using rectified linear units represents a continuous piecewise linear
(CPWL) function and vice versa. Recent results in the literature estimated that the number of …

Towards lower bounds on the depth of ReLU neural networks

C Hertrich, A Basu, M Di Summa… - Advances in Neural …, 2021 - proceedings.neurips.cc
We contribute to a better understanding of the class of functions that is represented by a
neural network with ReLU activations and a given architecture. Using techniques from mixed …

Optimization-based separations for neural networks

I Safran, J Lee - Conference on Learning Theory, 2022 - proceedings.mlr.press
Depth separation results propose a possible theoretical explanation for the benefits of deep
neural networks over shallower architectures, establishing that the former possess superior …

Width is less important than depth in relu neural networks

G Vardi, G Yehudai, O Shamir - Conference on learning …, 2022 - proceedings.mlr.press
We solve an open question from Lu et al.(2017), by showing that any target network with
inputs in $\mathbb {R}^ d $ can be approximated by a width $ O (d) $ network (independent …

On the optimal memorization power of relu neural networks

G Vardi, G Yehudai, O Shamir - arxiv preprint arxiv:2110.03187, 2021 - arxiv.org
We study the memorization power of feedforward ReLU neural networks. We show that such
networks can memorize any $ N $ points that satisfy a mild separability assumption using …

The connection between approximation, depth separation and learnability in neural networks

E Malach, G Yehudai… - … on Learning Theory, 2021 - proceedings.mlr.press
Several recent works have shown separation results between deep neural networks, and
hypothesis classes with inferior approximation capacity such as shallow networks or kernel …

Exponential separations in symmetric neural networks

A Zweig, J Bruna - Advances in Neural Information …, 2022 - proceedings.neurips.cc
In this work we demonstrate a novel separation between symmetric neural network
architectures. Specifically, we consider the Relational Network~\parencite …

Size and depth of monotone neural networks: interpolation and approximation

D Mikulincer, D Reichman - Advances in Neural …, 2022 - proceedings.neurips.cc
Monotone functions and data sets arise in a variety of applications. We study the
interpolation problem for monotone data sets: The input is a monotone data set with $ n …