Structured pruning for deep convolutional neural networks: A survey

Y He, L **ao - IEEE transactions on pattern analysis and …, 2023 - ieeexplore.ieee.org
The remarkable performance of deep Convolutional neural networks (CNNs) is generally
attributed to their deeper and wider architectures, which can come with significant …

A comprehensive survey on model compression and acceleration

T Choudhary, V Mishra, A Goswami… - Artificial Intelligence …, 2020 - Springer
In recent years, machine learning (ML) and deep learning (DL) have shown remarkable
improvement in computer vision, natural language processing, stock prediction, forecasting …

Flashattention: Fast and memory-efficient exact attention with io-awareness

T Dao, D Fu, S Ermon, A Rudra… - Advances in Neural …, 2022 - proceedings.neurips.cc
Transformers are slow and memory-hungry on long sequences, since the time and memory
complexity of self-attention are quadratic in sequence length. Approximate attention …

On-device training under 256kb memory

J Lin, L Zhu, WM Chen, WC Wang… - Advances in Neural …, 2022 - proceedings.neurips.cc
On-device training enables the model to adapt to new data collected from the sensors by
fine-tuning a pre-trained model. Users can benefit from customized AI models without having …

A-vit: Adaptive tokens for efficient vision transformer

H Yin, A Vahdat, JM Alvarez, A Mallya… - Proceedings of the …, 2022 - openaccess.thecvf.com
We introduce A-ViT, a method that adaptively adjusts the inference cost of vision transformer
ViT for images of different complexity. A-ViT achieves this by automatically reducing the …

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021 - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Adavit: Adaptive vision transformers for efficient image recognition

L Meng, H Li, BC Chen, S Lan, Z Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Built on top of self-attention mechanisms, vision transformers have demonstrated
remarkable performance on a variety of vision tasks recently. While achieving excellent …

Mcunet: Tiny deep learning on iot devices

J Lin, WM Chen, Y Lin, C Gan… - Advances in neural …, 2020 - proceedings.neurips.cc
Abstract Machine learning on tiny IoT devices based on microcontroller units (MCU) is
appealing but challenging: the memory of microcontrollers is 2-3 orders of magnitude …