Model compression and hardware acceleration for neural networks: A comprehensive survey

L Deng, G Li, S Han, L Shi, Y **e - Proceedings of the IEEE, 2020 - ieeexplore.ieee.org
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …

Channel permutations for n: m sparsity

J Pool, C Yu - Advances in neural information processing …, 2021 - proceedings.neurips.cc
We introduce channel permutations as a method to maximize the accuracy of N: M sparse
networks. N: M sparsity requires N out of M consecutive elements to be zero and has been …

1xn pattern for pruning convolutional neural networks

M Lin, Y Zhang, Y Li, B Chen, F Chao… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org
Though network pruning receives popularity in reducing the complexity of convolutional
neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy …

Bi-directional masks for efficient n: M sparse training

Y Zhang, Y Luo, M Lin, Y Zhong, J **e… - … on machine learning, 2023 - proceedings.mlr.press
We focus on addressing the dense backward propagation issue for training efficiency of N:
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …

Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference

Y Wang, J Shen, TK Hu, P Xu, T Nguyen… - IEEE Journal of …, 2020 - ieeexplore.ieee.org
State-of-the-art convolutional neural networks (CNNs) yield record-breaking predictive
performance, yet at the cost of high-energy-consumption inference, that prohibits their widely …

Reaf: Remembering enhancement and entropy-based asymptotic forgetting for filter pruning

X Zhang, W **e, Y Li, K Jiang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Neurologically, filter pruning is a procedure of forgetting and remembering recovering.
Prevailing methods directly forget less important information from an unrobust baseline at …

An accelerating convolutional neural networks via a 2D entropy based-adaptive filter search method for image recognition

C Li, H Li, G Gao, Z Liu, P Liu - Applied Soft Computing, 2023 - Elsevier
The success of CNNs for various vision tasks has been accompanied by a significant
increase in required FLOPs and parameter quantities, which has impeded the deployment of …

Co-exploring structured sparsification and low-rank tensor decomposition for compact dnns

Y Sui, M Yin, Y Gong, B Yuan - IEEE Transactions on Neural …, 2024 - ieeexplore.ieee.org
Sparsification and low-rank decomposition are two important techniques to compress deep
neural network (DNN) models. To date, these two popular yet distinct approaches are …

ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory

J Han, H Liu, M Wang, Z Li… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Processing-in-memory (PIM) architecture based on resistive random access memory
(ReRAM) crossbars is a promising solution to the memory bottleneck that long short-term …

A sparse CNN accelerator for eliminating redundant computations in intra-and inter-convolutional/pooling layers

C Yang, Y Meng, K Huo, J **… - IEEE Transactions on Very …, 2022 - ieeexplore.ieee.org
Neural network pruning, which can be divided into unstructured pruning and structured
pruning strategies, has been proven to be an efficient method to substantially reduce the …