Deep learning: a statistical viewpoint

PL Bartlett, A Montanari, A Rakhlin - Acta numerica, 2021 - cambridge.org
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …

Dataset distillation with infinitely wide convolutional networks

T Nguyen, R Novak, L **ao… - Advances in Neural …, 2021 - proceedings.neurips.cc
The effectiveness of machine learning algorithms arises from being able to extract useful
features from large amounts of data. As model and dataset sizes increase, dataset …

High-dimensional asymptotics of feature learning: How one gradient step improves the representation

J Ba, MA Erdogdu, T Suzuki, Z Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …

Gradient starvation: A learning proclivity in neural networks

M Pezeshki, O Kaba, Y Bengio… - Advances in …, 2021 - proceedings.neurips.cc
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …

What can a single attention layer learn? a study through the random features lens

H Fu, T Guo, Y Bai, S Mei - Advances in Neural Information …, 2024 - proceedings.neurips.cc
Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …

Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arxiv preprint arxiv …, 2023 - arxiv.org
The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

Llm inference unveiled: Survey and roofline model insights

Z Yuan, Y Shang, Y Zhou, Z Dong, Z Zhou… - arxiv preprint arxiv …, 2024 - arxiv.org
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …

Finite versus infinite neural networks: an empirical study

J Lee, S Schoenholz, J Pennington… - Advances in …, 2020 - proceedings.neurips.cc
We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …

A primer on Bayesian neural networks: review and debates

J Arbel, K Pitas, M Vladimirova, V Fortuin - arxiv preprint arxiv:2309.16314, 2023 - arxiv.org
Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …

Bayesian deep ensembles via the neural tangent kernel

B He, B Lakshminarayanan… - Advances in neural …, 2020 - proceedings.neurips.cc
We explore the link between deep ensembles and Gaussian processes (GPs) through the
lens of the Neural Tangent Kernel (NTK): a recent development in understanding the …