A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …

F Wang, Z Zhang, X Zhang, Z Wu, T Mo, Q Lu… - arxiv preprint arxiv …, 2024 - ai.radensa.ru
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …

Maskllm: Learnable semi-structured sparsity for large language models

G Fang, H Yin, S Muralidharan, G Heinrich… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …

Transformer layers as painters

Q Sun, M Pickett, AK Nain, L Jones - arxiv preprint arxiv:2407.09298, 2024 - arxiv.org
Despite their nearly universal adoption for large language models, the internal workings of
transformers are not well understood. We aim to better understand the impact of removing or …

Modegpt: Modular decomposition for large language model compression

CH Lin, S Gao, JS Smith, A Patel, S Tuli, Y Shen… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by
demonstrating exceptional performance across various tasks. However, substantial …

Alphapruning: Using heavy-tailed self regularization theory for improved layer-wise pruning of large language models

H Lu, Y Zhou, S Liu, Z Wang, MW Mahoney… - arxiv preprint arxiv …, 2024 - arxiv.org
Recent work on pruning large language models (LLMs) has shown that one can eliminate a
large number of parameters without compromising performance, making pruning a …

A deeper look at depth pruning of LLMs

SA Siddiqui, X Dong, G Heinrich, T Breuel… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …

Slmrec: empowering small language models for sequential recommendation

W Xu, Q Wu, Z Liang, J Han, X Ning, Y Shi… - arxiv preprint arxiv …, 2024 - arxiv.org
Sequential Recommendation (SR) task involves predicting the next item a user is likely to
interact with, given their past interactions. The SR models examine the sequence of a user's …