A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

Quarot: Outlier-free 4-bit inference in rotated llms

S Ashkboos, A Mohtashami, ML Croci, B Li… - arxiv preprint arxiv …, 2024 - arxiv.org
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arxiv preprint arxiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Compact language models via pruning and knowledge distillation

S Muralidharan, ST Sreenivas, RB Joshi… - The Thirty-eighth …, 2024 - openreview.net
Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …

Assessing the brittleness of safety alignment via pruning and low-rank modifications

B Wei, K Huang, Y Huang, T **e, X Qi, M **a… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as
evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This …

Shortgpt: Layers in large language models are more redundant than you expect

X Men, M Xu, Q Zhang, B Wang, H Lin, Y Lu… - arxiv preprint arxiv …, 2024 - arxiv.org
As Large Language Models (LLMs) continue to advance in performance, their size has
escalated significantly, with current LLMs containing billions or even trillions of parameters …

Maskllm: Learnable semi-structured sparsity for large language models

G Fang, H Yin, S Muralidharan, G Heinrich… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …

The llm surgeon

TFA van der Ouderaa, M Nagel, M Van Baalen… - arxiv preprint arxiv …, 2023 - arxiv.org
State-of-the-art language models are becoming increasingly large in an effort to achieve the
highest performance on large corpora of available textual data. However, the sheer size of …

Mobillama: Towards accurate and lightweight fully transparent gpt

O Thawakar, A Vayani, S Khan, H Cholakal… - arxiv preprint arxiv …, 2024 - arxiv.org
" Bigger the better" has been the predominant trend in recent Large Language Models
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …

A deeper look at depth pruning of llms

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …