Structured pruning for deep convolutional neural networks: A survey
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …
attributed to their deeper and wider architectures, which can come with significant …
A survey of uncertainty in deep neural networks
Over the last decade, neural networks have reached almost every field of science and
become a crucial part of various real world applications. Due to the increasing spread …
become a crucial part of various real world applications. Due to the increasing spread …
Laplace redux-effortless bayesian deep learning
Bayesian formulations of deep learning have been shown to have compelling theoretical
properties and offer practical functional benefits, such as improved predictive uncertainty …
properties and offer practical functional benefits, such as improved predictive uncertainty …
Sophia: A scalable stochastic second-order optimizer for language model pre-training
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …
optimization algorithm would lead to a material reduction on the time and cost of training …
Studying large language model generalization with influence functions
When trying to gain better visibility into a machine learning model in order to understand and
mitigate the associated risks, a potentially valuable source of evidence is: which training …
mitigate the associated risks, a potentially valuable source of evidence is: which training …
Make sharpness-aware minimization stronger: A sparsified perturbation approach
Deep neural networks often suffer from poor generalization caused by complex and non-
convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization …
convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization …
Limitations of the empirical fisher approximation for natural gradient descent
Natural gradient descent, which preconditions a gradient descent update with the Fisher
information matrix of the underlying statistical model, is a way to capture partial second …
information matrix of the underlying statistical model, is a way to capture partial second …
Unleashing the power of data tsunami: A comprehensive survey on data assessment and selection for instruction tuning of language models
Instruction tuning plays a critical role in aligning large language models (LLMs) with human
preference. Despite the vast amount of open instruction datasets, naively training a LLM on …
preference. Despite the vast amount of open instruction datasets, naively training a LLM on …
Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …
machine learning models and for improving the transparency of the AI pipeline. The …
Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model
Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …