Quantum variational algorithms are swamped with traps

ER Anschuetz, BT Kiani - Nature Communications, 2022 - nature.com
One of the most important properties of classical neural networks is how surprisingly
trainable they are, though their training algorithms typically rely on optimizing complicated …

Fl-ntk: A neural tangent kernel-based framework for federated learning analysis

B Huang, X Li, Z Song, X Yang - International Conference on …, 2021 - proceedings.mlr.press
Federated Learning (FL) is an emerging learning scheme that allows different distributed
clients to train deep neural networks together without data sharing. Neural networks have …

When deep learning meets polyhedral theory: A survey

J Huchette, G Muñoz, T Serra, C Tsay - arxiv preprint arxiv:2305.00241, 2023 - arxiv.org
In the past decade, deep learning became the prevalent methodology for predictive
modeling thanks to the remarkable accuracy of deep neural networks in tasks such as …

Provably learning a multi-head attention layer

S Chen, Y Li - arxiv preprint arxiv:2402.04084, 2024 - arxiv.org
The multi-head attention layer is one of the key components of the transformer architecture
that sets it apart from traditional feed-forward models. Given a sequence length $ k …

Towards lower bounds on the depth of ReLU neural networks

C Hertrich, A Basu, M Di Summa… - Advances in Neural …, 2021 - proceedings.neurips.cc
We contribute to a better understanding of the class of functions that is represented by a
neural network with ReLU activations and a given architecture. Using techniques from mixed …

Bounding the width of neural networks via coupled initialization a worst case analysis

A Munteanu, S Omlor, Z Song… - … on Machine Learning, 2022 - proceedings.mlr.press
A common method in training neural networks is to initialize all the weights to be
independent Gaussian vectors. We observe that by instead initializing the weights into …

Hardness of noise-free learning for two-hidden-layer neural networks

S Chen, A Gollakota, A Klivans… - Advances in Neural …, 2022 - proceedings.neurips.cc
We give superpolynomial statistical query (SQ) lower bounds for learning two-hidden-layer
ReLU networks with respect to Gaussian inputs in the standard (noise-free) model. No …

Training Fully Connected Neural Networks is -Complete

D Bertschinger, C Hertrich… - Advances in …, 2023 - proceedings.neurips.cc
We consider the algorithmic problem of finding the optimal weights and biases for a two-
layer fully connected neural network to fit a given set of data points, also known as empirical …

Learning narrow one-hidden-layer relu networks

S Chen, Z Dou, S Goel, A Klivans… - The Thirty Sixth Annual …, 2023 - proceedings.mlr.press
We consider the well-studied problem of learning a linear combination of $ k $ ReLU
activations with respect to a Gaussian distribution on inputs in $ d $ dimensions. We give the …

Agnostically learning multi-index models with queries

I Diakonikolas, DM Kane, V Kontonis… - 2024 IEEE 65th …, 2024 - ieeexplore.ieee.org
We study the power of query access for the fundamental task of agnostic learning under the
Gaussian distribution. In the agnostic model, no assumptions are made on the labels of the …