A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …
massive model sizes that require significant computational and storage resources. To …
Quarot: Outlier-free 4-bit inference in rotated llms
We introduce QuaRot, a new Quantization scheme based on Rotations, which is able to
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …
quantize LLMs end-to-end, including all weights, activations, and KV cache in 4 bits. QuaRot …
[PDF][PDF] A comprehensive survey of small language models in the era of large language models: Techniques, enhancements, applications, collaboration with llms, and …
Large language models (LLM) have demonstrated emergent abilities in text generation,
question answering, and reasoning, facilitating various tasks and domains. Despite their …
question answering, and reasoning, facilitating various tasks and domains. Despite their …
Compact language models via pruning and knowledge distillation
Large language models (LLMs) targeting different deployment scales and sizes are currently
produced by training each variant from scratch; this is extremely compute-intensive. In this …
produced by training each variant from scratch; this is extremely compute-intensive. In this …
Assessing the brittleness of safety alignment via pruning and low-rank modifications
Large language models (LLMs) show inherent brittleness in their safety mechanisms, as
evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This …
evidenced by their susceptibility to jailbreaking and even non-malicious fine-tuning. This …
Shortgpt: Layers in large language models are more redundant than you expect
As Large Language Models (LLMs) continue to advance in performance, their size has
escalated significantly, with current LLMs containing billions or even trillions of parameters …
escalated significantly, with current LLMs containing billions or even trillions of parameters …
Maskllm: Learnable semi-structured sparsity for large language models
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
typically result in significant redundancy. This work introduces MaskLLM, a learnable …
The llm surgeon
State-of-the-art language models are becoming increasingly large in an effort to achieve the
highest performance on large corpora of available textual data. However, the sheer size of …
highest performance on large corpora of available textual data. However, the sheer size of …
Mobillama: Towards accurate and lightweight fully transparent gpt
" Bigger the better" has been the predominant trend in recent Large Language Models
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …
(LLMs) development. However, LLMs do not suit well for scenarios that require on-device …
A deeper look at depth pruning of llms
Large Language Models (LLMs) are not only resource-intensive to train but even more
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …
costly to deploy in production. Therefore, recent work has attempted to prune blocks of LLMs …