Model compression and hardware acceleration for neural networks: A comprehensive survey
Domain-specific hardware is becoming a promising topic in the backdrop of improvement
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
slow down for general-purpose processors due to the foreseeable end of Moore's Law …
Channel permutations for n: m sparsity
We introduce channel permutations as a method to maximize the accuracy of N: M sparse
networks. N: M sparsity requires N out of M consecutive elements to be zero and has been …
networks. N: M sparsity requires N out of M consecutive elements to be zero and has been …
1xn pattern for pruning convolutional neural networks
Though network pruning receives popularity in reducing the complexity of convolutional
neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy …
neural networks (CNNs), it remains an open issue to concurrently maintain model accuracy …
Bi-directional masks for efficient n: M sparse training
We focus on addressing the dense backward propagation issue for training efficiency of N:
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …
M fine-grained sparsity that preserves at most N out of M consecutive weights and achieves …
Dual dynamic inference: Enabling more efficient, adaptive, and controllable deep inference
State-of-the-art convolutional neural networks (CNNs) yield record-breaking predictive
performance, yet at the cost of high-energy-consumption inference, that prohibits their widely …
performance, yet at the cost of high-energy-consumption inference, that prohibits their widely …
Reaf: Remembering enhancement and entropy-based asymptotic forgetting for filter pruning
Neurologically, filter pruning is a procedure of forgetting and remembering recovering.
Prevailing methods directly forget less important information from an unrobust baseline at …
Prevailing methods directly forget less important information from an unrobust baseline at …
An accelerating convolutional neural networks via a 2D entropy based-adaptive filter search method for image recognition
The success of CNNs for various vision tasks has been accompanied by a significant
increase in required FLOPs and parameter quantities, which has impeded the deployment of …
increase in required FLOPs and parameter quantities, which has impeded the deployment of …
Co-exploring structured sparsification and low-rank tensor decomposition for compact dnns
Sparsification and low-rank decomposition are two important techniques to compress deep
neural network (DNN) models. To date, these two popular yet distinct approaches are …
neural network (DNN) models. To date, these two popular yet distinct approaches are …
ERA-LSTM: An efficient ReRAM-based architecture for long short-term memory
J Han, H Liu, M Wang, Z Li… - IEEE Transactions on …, 2019 - ieeexplore.ieee.org
Processing-in-memory (PIM) architecture based on resistive random access memory
(ReRAM) crossbars is a promising solution to the memory bottleneck that long short-term …
(ReRAM) crossbars is a promising solution to the memory bottleneck that long short-term …
A sparse CNN accelerator for eliminating redundant computations in intra-and inter-convolutional/pooling layers
C Yang, Y Meng, K Huo, J **… - IEEE Transactions on Very …, 2022 - ieeexplore.ieee.org
Neural network pruning, which can be divided into unstructured pruning and structured
pruning strategies, has been proven to be an efficient method to substantially reduce the …
pruning strategies, has been proven to be an efficient method to substantially reduce the …