Efficient tensor decomposition-based filter pruning

Y Zniyed, TP Nguyen - Neural Networks, 2024 - Elsevier
In this paper, we present CORING, which is short for effiCient tensOr decomposition-based
filteR prunING, a novel filter pruning methodology for neural networks. CORING is crafted to …

[PDF][PDF] Plug-and-play: An efficient post-training pruning method for large language models

Y Zhang, H Bai, H Lin, J Zhao, L Hou… - The Twelfth …, 2024 - preprints.org
With the rapid growth of large language models (LLMs), there is increasing demand for
memory and computation in LLMs. Recent efforts on post-training pruning of LLMs aim to …

Discovering sparsity allocation for layer-wise pruning of large language models

L Li, P Dong, Z Tang, X Liu, Q Wang… - Advances in …, 2025 - proceedings.neurips.cc
In this paper, we present DSA, the first automated framework for discovering sparsity
allocation schemes for layer-wise pruning in Large Language Models (LLMs). LLMs have …

Besa: Pruning large language models with blockwise parameter-efficient sparsity allocation

P Xu, W Shao, M Chen, S Tang, K Zhang, P Gao… - arxiv preprint arxiv …, 2024 - arxiv.org
Large language models (LLMs) have demonstrated outstanding performance in various
tasks, such as text summarization, text question-answering, and etc. While their performance …

Maskllm: Learnable semi-structured sparsity for large language models

G Fang, H Yin, S Muralidharan, G Heinrich… - arxiv preprint arxiv …, 2024 - arxiv.org
Large Language Models (LLMs) are distinguished by their massive parameter counts, which
typically result in significant redundancy. This work introduces MaskLLM, a learnable …

Automatic network pruning via hilbert-schmidt independence criterion lasso under information bottleneck principle

S Guo, L Zhang, X Zheng, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most existing neural network pruning methods hand-crafted their importance criteria and
structures to prune. This constructs heavy and unintended dependencies on heuristics and …

Efficient Deep Learning Infrastructures for Embedded Computing Systems: A Comprehensive Survey and Future Envision

X Luo, D Liu, H Kong, S Huai, H Chen… - ACM Transactions on …, 2024 - dl.acm.org
Deep neural networks (DNNs) have recently achieved impressive success across a wide
range of real-world vision and language processing tasks, spanning from image …

Towards performance-maximizing neural network pruning via global channel attention

Y Wang, S Guo, J Guo, J Zhang, W Zhang, C Yan… - Neural Networks, 2024 - Elsevier
Network pruning has attracted increasing attention recently for its capability of transferring
large-scale neural networks (eg, CNNs) into resource-constrained devices. Such a transfer …

Adaptive Layer Sparsity for Large Language Models via Activation Correlation Assessment

W Li, L Li, M Lee, S Sun - Advances in Neural Information …, 2025 - proceedings.neurips.cc
Abstract Large Language Models (LLMs) have revolutionized the field of natural language
processing with their impressive capabilities. However, their enormous size presents …

ELSA: Exploiting layer-wise n: m sparsity for vision transformer acceleration

NC Huang, CC Chang, WC Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
N: M sparsity is an emerging model compression method supported by more and more
accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing …