Structured pruning for deep convolutional neural networks: A survey

Y He, L **ao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

A survey of uncertainty in deep neural networks

J Gawlikowski, CRN Tassi, M Ali, J Lee, M Humt… - Artificial Intelligence …, 2023 - Springer
Over the last decade, neural networks have reached almost every field of science and
become a crucial part of various real world applications. Due to the increasing spread …

Laplace redux-effortless bayesian deep learning

E Daxberger, A Kristiadi, A Immer… - Advances in …, 2021 - proceedings.neurips.cc
Bayesian formulations of deep learning have been shown to have compelling theoretical
properties and offer practical functional benefits, such as improved predictive uncertainty …

Sophia: A scalable stochastic second-order optimizer for language model pre-training

H Liu, Z Li, D Hall, P Liang, T Ma - arxiv preprint arxiv:2305.14342, 2023 - arxiv.org
Given the massive cost of language model pre-training, a non-trivial improvement of the
optimization algorithm would lead to a material reduction on the time and cost of training …

Studying large language model generalization with influence functions

R Grosse, J Bae, C Anil, N Elhage, A Tamkin… - arxiv preprint arxiv …, 2023 - arxiv.org
When trying to gain better visibility into a machine learning model in order to understand and
mitigate the associated risks, a potentially valuable source of evidence is: which training …

Make sharpness-aware minimization stronger: A sparsified perturbation approach

P Mi, L Shen, T Ren, Y Zhou, X Sun… - Advances in Neural …, 2022 - proceedings.neurips.cc
Deep neural networks often suffer from poor generalization caused by complex and non-
convex loss landscapes. One of the popular solutions is Sharpness-Aware Minimization …

Limitations of the empirical fisher approximation for natural gradient descent

F Kunstner, P Hennig, L Balles - Advances in neural …, 2019 - proceedings.neurips.cc
Natural gradient descent, which preconditions a gradient descent update with the Fisher
information matrix of the underlying statistical model, is a way to capture partial second …

Unleashing the power of data tsunami: A comprehensive survey on data assessment and selection for instruction tuning of language models

Y Qin, Y Yang, P Guo, G Li, H Shao, Y Shi, Z Xu… - arxiv preprint arxiv …, 2024 - arxiv.org
Instruction tuning plays a critical role in aligning large language models (LLMs) with human
preference. Despite the vast amount of open instruction datasets, naively training a LLM on …

Datainf: Efficiently estimating data influence in lora-tuned llms and diffusion models

Y Kwon, E Wu, K Wu, J Zou - arxiv preprint arxiv:2310.00902, 2023 - arxiv.org
Quantifying the impact of training data points is crucial for understanding the outputs of
machine learning models and for improving the transparency of the AI pipeline. The …

Which algorithmic choices matter at which batch sizes? insights from a noisy quadratic model

G Zhang, L Li, Z Nado, J Martens… - Advances in neural …, 2019 - proceedings.neurips.cc
Increasing the batch size is a popular way to speed up neural network training, but beyond
some critical batch size, larger batch sizes yield diminishing returns. In this work, we study …