A survey on deep neural network pruning: Taxonomy, comparison, analysis, and recommendations

H Cheng, M Zhang, JQ Shi - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org
Modern deep neural networks, particularly recent large language models, come with
massive model sizes that require significant computational and storage resources. To …

A survey of techniques for optimizing transformer inference

KT Chitty-Venkata, S Mittal, M Emani… - Journal of Systems …, 2023 - Elsevier
Recent years have seen a phenomenal rise in the performance and applications of
transformer neural networks. The family of transformer networks, including Bidirectional …

Spvit: Enabling faster vision transformers via latency-aware soft token pruning

Z Kong, P Dong, X Ma, X Meng, W Niu, M Sun… - European conference on …, 2022 - Springer
Abstract Recently, Vision Transformer (ViT) has continuously established new milestones in
the computer vision field, while the high computation and memory cost makes its …

Green ai: Do deep learning frameworks have different costs?

S Georgiou, M Kechagia, T Sharma, F Sarro… - Proceedings of the 44th …, 2022 - dl.acm.org
The use of Artificial Intelligence (ai), and more specifically of Deep Learning (dl), in modern
software systems, is nowadays widespread and continues to grow. At the same time, its …

Compressing large-scale transformer-based models: A case study on bert

P Ganesh, Y Chen, X Lou, MA Khan, Y Yang… - Transactions of the …, 2021 - direct.mit.edu
Pre-trained Transformer-based models have achieved state-of-the-art performance for
various Natural Language Processing (NLP) tasks. However, these models often have …

Pruning self-attentions into convolutional layers in single path

H He, J Cai, J Liu, Z Pan, J Zhang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Vision Transformers (ViTs) have achieved impressive performance over various computer
vision tasks. However, modeling global correlations with multi-head self-attention (MSA) …

A survey of deep learning on cpus: opportunities and co-optimizations

S Mittal, P Rajput, S Subramoney - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
CPU is a powerful, pervasive, and indispensable platform for running deep learning (DL)
workloads in systems ranging from mobile to extreme-end servers. In this article, we present …

Accelerating framework of transformer by hardware design and model compression co-optimization

P Qi, EHM Sha, Q Zhuge, H Peng… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
State-of-the-art Transformer-based models, with gigantic parameters, are difficult to be
accommodated on resource constrained embedded devices. Moreover, with the …

Gradient-free structured pruning with unlabeled data

A Nova, H Dai, D Schuurmans - International Conference on …, 2023 - proceedings.mlr.press
Abstract Large Language Models (LLMs) have achieved great success in solving difficult
tasks across many domains, but such success comes with a high computation cost, and …

Joint structured pruning and dense knowledge distillation for efficient transformer model compression

B Cui, Y Li, Z Zhang - Neurocomputing, 2021 - Elsevier
In this paper, we develop a novel Joint Model Compression (referred to as JMC) method by
combining structured pruning and dense knowledge distillation techniques to significantly …