Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

An updated survey of efficient hardware architectures for accelerating deep convolutional neural networks

M Capra, B Bussolino, A Marchisio, M Shafique… - Future Internet, 2020 - mdpi.com
Deep Neural Networks (DNNs) are nowadays a common practice in most of the Artificial
Intelligence (AI) applications. Their ability to go beyond human precision has made these …

Sparsity in deep learning: Pruning and growth for efficient inference and training in neural networks

T Hoefler, D Alistarh, T Ben-Nun, N Dryden… - Journal of Machine …, 2021 - jmlr.org
The growing energy and performance costs of deep learning have driven the community to
reduce the size of neural networks by selectively pruning components. Similarly to their …

Spatten: Efficient sparse attention architecture with cascade token and head pruning

H Wang, Z Zhang, S Han - 2021 IEEE International Symposium …, 2021 - ieeexplore.ieee.org
The attention mechanism is becoming increasingly popular in Natural Language Processing
(NLP) applications, showing superior performance than convolutional and recurrent …

Sigma: A sparse and irregular gemm accelerator with flexible interconnects for dnn training

E Qin, A Samajdar, H Kwon, V Nadella… - … Symposium on High …, 2020 - ieeexplore.ieee.org
The advent of Deep Learning (DL) has radically transformed the computing industry across
the entire spectrum from algorithms to circuits. As myriad application domains embrace DL, it …

A modern primer on processing in memory

O Mutlu, S Ghose, J Gómez-Luna… - … computing: from devices …, 2022 - Springer
Modern computing systems are overwhelmingly designed to move data to computation. This
design choice goes directly against at least three key trends in computing that cause …

[BOOK][B] Efficient processing of deep neural networks

V Sze, YH Chen, TJ Yang, JS Emer - 2020 - Springer
This book provides a structured treatment of the key principles and techniques for enabling
efficient processing of deep neural networks (DNNs). DNNs are currently widely used for …

Sparch: Efficient architecture for sparse matrix multiplication

Z Zhang, H Wang, S Han… - 2020 IEEE International …, 2020 - ieeexplore.ieee.org
Generalized Sparse Matrix-Matrix Multiplication (SpGEMM) is a ubiquitous task in various
engineering and scientific applications. However, inner product based SpGEMM introduces …

GCNAX: A flexible and energy-efficient accelerator for graph convolutional neural networks

J Li, A Louri, A Karanth… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
Graph convolutional neural networks (GCNs) have emerged as an effective approach to
extend deep learning for graph data analytics. Given that graphs are usually irregular, as …

A survey on deep learning hardware accelerators for heterogeneous hpc platforms

C Silvano, D Ielmini, F Ferrandi, L Fiorin… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent trends in deep learning (DL) imposed hardware accelerators as the most viable
solution for several classes of high-performance computing (HPC) applications such as …