A simple and effective pruning approach for large language models

M Sun, Z Liu, A Bair, JZ Kolter - arxiv preprint arxiv:2306.11695, 2023 - arxiv.org
As their size increases, Large Languages Models (LLMs) are natural candidates for network
pruning methods: approaches that drop a subset of network weights while striving to …

Everybody prune now: Structured pruning of llms with only forward passes

L Dery, S Kolawole, JF Kagy, V Smith, G Neubig… - arxiv preprint arxiv …, 2024 - arxiv.org
Given the generational gap in available hardware between lay practitioners and the most
endowed institutions, LLMs are becoming increasingly inaccessible as they grow in size …

FALCON: FLOP-Aware Combinatorial Optimization for Neural Network Pruning

X Meng, W Chen, R Benbaki… - International …, 2024 - proceedings.mlr.press
The increasing computational demands of modern neural networks present deployment
challenges on resource-constrained devices. Network pruning offers a solution to reduce …

MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning

M Farina, M Mancini, E Cunegatti… - Proceedings of the …, 2024 - openaccess.thecvf.com
While excellent in transfer learning Vision-Language models (VLMs) come with high
computational costs due to their large number of parameters. To address this issue …

Multi-objective evolutionary architectural pruning of deep convolutional neural networks with weights inheritance

KT Chung, CKM Lee, YP Tsang, CH Wu… - Information Sciences, 2024 - Elsevier
Despite the ongoing success of artificial intelligence applications, the deployment of deep
learning models on end devices remains challenging due to the limited onboard …

The Iterative Optimal Brain Surgeon: Faster Sparse Recovery by Leveraging Second-Order Information

D Wu, IV Modoranu, M Safaryan… - Advances in …, 2025 - proceedings.neurips.cc
The rising footprint of machine learning has led to a focus on imposing model sparsity as a
means of reducing computational and memory costs. For deep neural networks (DNNs), the …

Oats: Outlier-aware pruning through sparse and low rank decomposition

S Zhang, V Papyan - arxiv preprint arxiv:2409.13652, 2024 - arxiv.org
The recent paradigm shift to large-scale foundation models has brought about a new era for
deep learning that, while has found great success in practice, has also been plagued by …

L0learn: A scalable package for sparse learning using l0 regularization

H Hazimeh, R Mazumder, T Nonet - Journal of Machine Learning Research, 2023 - jmlr.org
We present L0Learn: an open-source package for sparse linear regression and
classification using ℓ0 regularization. L0Learn implements scalable, approximate …

Less is ken: a universal and simple non-parametric pruning algorithm for large language models

M Mastromattei, FM Zanzotto - arxiv preprint arxiv:2402.03142, 2024 - arxiv.org
Neural network pruning has become increasingly crucial due to the complexity of neural
network models and their widespread use in various fields. Existing pruning algorithms often …

LayerMerge: Neural Network Depth Compression through Layer Pruning and Merging

J Kim, ME Halabi, M Ji, HO Song - arxiv preprint arxiv:2406.12837, 2024 - arxiv.org
Recent works show that reducing the number of layers in a convolutional neural network can
enhance efficiency while maintaining the performance of the network. Existing depth …