A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …
massive model sizes that require significant computational and storage resources. To …
[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …
question answering, and reasoning, facilitating various tasks and domains. Despite their …
Maskllm: Learnable semi-structured sparsity for large language models
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
Transformer layers as painters
Despite their nearly universal adoption for large language models, the internal workings of
transformers are not well understood. We aim to better understand the impact of removing or …
transformers are not well understood. We aim to better understand the impact of removing or …
Modegpt: Modular decomposition for large language model compression
Large Language Models (LLMs) have reshaped the landscape of artificial intelligence by
demonstrating exceptional performance across various tasks. However, substantial …
demonstrating exceptional performance across various tasks. However, substantial …
Alphapruning: Using heavy-tailed self regularization theory for improved layer-wise pruning of large language models
Recent work on pruning large language models (LLMs) has shown that one can eliminate a
large number of parameters without compromising performance, making pruning a …
large number of parameters without compromising performance, making pruning a …
A deeper look at depth pruning of LLMs
Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …
Slmrec: empowering small language models for sequential recommendation
Sequential Recommendation (SR) task involves predicting the next item a user is likely to
interact with, given their past interactions. The SR models examine the sequence of a user's …
interact with, given their past interactions. The SR models examine the sequence of a user's …