Deep learning: a statistical viewpoint
The remarkable practical success of deep learning has revealed some major surprises from
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
a theoretical perspective. In particular, simple gradient methods easily find near-optimal …
Dataset distillation with infinitely wide convolutional networks
The effectiveness of machine learning algorithms arises from being able to extract useful
features from large amounts of data. As model and dataset sizes increase, dataset …
features from large amounts of data. As model and dataset sizes increase, dataset …
High-dimensional asymptotics of feature learning: How one gradient step improves the representation
We study the first gradient descent step on the first-layer parameters $\boldsymbol {W} $ in a
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
two-layer neural network: $ f (\boldsymbol {x})=\frac {1}{\sqrt {N}}\boldsymbol {a}^\top\sigma …
Gradient starvation: A learning proclivity in neural networks
We identify and formalize a fundamental gradient descent phenomenon resulting in a
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
learning proclivity in over-parameterized neural networks. Gradient Starvation arises when …
What can a single attention layer learn? a study through the random features lens
Attention layers---which map a sequence of inputs to a sequence of outputs---are core
building blocks of the Transformer architecture which has achieved significant …
building blocks of the Transformer architecture which has achieved significant …
Trak: Attributing model behavior at scale
The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …
line of work towards this goal, existing approaches to data attribution tend to force users to …
Llm inference unveiled: Survey and roofline model insights
The field of efficient Large Language Model (LLM) inference is rapidly evolving, presenting a
unique blend of opportunities and challenges. Although the field has expanded and is …
unique blend of opportunities and challenges. Although the field has expanded and is …
Finite versus infinite neural networks: an empirical study
We perform a careful, thorough, and large scale empirical study of the correspondence
between wide neural networks and kernel methods. By doing so, we resolve a variety of …
between wide neural networks and kernel methods. By doing so, we resolve a variety of …
A primer on Bayesian neural networks: review and debates
Neural networks have achieved remarkable performance across various problem domains,
but their widespread applicability is hindered by inherent limitations such as overconfidence …
but their widespread applicability is hindered by inherent limitations such as overconfidence …
Bayesian deep ensembles via the neural tangent kernel
We explore the link between deep ensembles and Gaussian processes (GPs) through the
lens of the Neural Tangent Kernel (NTK): a recent development in understanding the …
lens of the Neural Tangent Kernel (NTK): a recent development in understanding the …