Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022‏ - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

Adapting neural networks at runtime: Current trends in at-runtime optimizations for deep learning

M Sponner, B Waschneck, A Kumar - ACM Computing Surveys, 2024‏ - dl.acm.org
Adaptive optimization methods for deep learning adjust the inference task to the current
circumstances at runtime to improve the resource footprint while maintaining the model's …

Pruning and quantization for deep neural network acceleration: A survey

T Liang, J Glossner, L Wang, S Shi, X Zhang - Neurocomputing, 2021‏ - Elsevier
Deep neural networks have been applied in many applications exhibiting extraordinary
abilities in the field of computer vision. However, complex network architectures challenge …

Dynamic neural networks: A survey

Y Han, G Huang, S Song, L Yang… - IEEE transactions on …, 2021‏ - ieeexplore.ieee.org
Dynamic neural network is an emerging research topic in deep learning. Compared to static
models which have fixed computational graphs and parameters at the inference stage …

Binary neural networks: A survey

H Qin, R Gong, X Liu, X Bai, J Song, N Sebe - Pattern Recognition, 2020‏ - Elsevier
The binary neural network, largely saving the storage and computation, serves as a
promising technique for deploying deep models on resource-limited devices. However, the …

Not all images are worth 16x16 words: Dynamic transformers for efficient image recognition

Y Wang, R Huang, S Song… - Advances in neural …, 2021‏ - proceedings.neurips.cc
Abstract Vision Transformers (ViT) have achieved remarkable success in large-scale image
recognition. They split every 2D image into a fixed number of patches, each of which is …

Forward and backward information retention for accurate binary neural networks

H Qin, R Gong, X Liu, M Shen, Z Wei… - Proceedings of the …, 2020‏ - openaccess.thecvf.com
Weight and activation binarization is an effective approach to deep neural network
compression and can accelerate the inference by leveraging bitwise operations. Although …

The lazy neuron phenomenon: On emergence of activation sparsity in transformers

Z Li, C You, S Bhojanapalli, D Li, AS Rawat… - arxiv preprint arxiv …, 2022‏ - arxiv.org
This paper studies the curious phenomenon for machine learning models with Transformer
architectures that their activation maps are sparse. By activation map we refer to the …

Sanger: A co-design framework for enabling sparse attention using reconfigurable architecture

L Lu, Y **, H Bi, Z Luo, P Li, T Wang… - MICRO-54: 54th Annual …, 2021‏ - dl.acm.org
In recent years, attention-based models have achieved impressive performance in natural
language processing and computer vision applications by effectively capturing contextual …

Pnp-detr: Towards efficient visual analysis with transformers

T Wang, L Yuan, Y Chen, J Feng… - Proceedings of the …, 2021‏ - openaccess.thecvf.com
Recently, DETR pioneered the solution of vision tasks with transformers, it directly translates
the image feature map into the object detection result. Though effective, translating the full …