Hardware and software optimizations for accelerating deep neural networks: Survey of current trends, challenges, and the road ahead

M Capra, B Bussolino, A Marchisio, G Masera… - IEEE …, 2020 - ieeexplore.ieee.org
Currently, Machine Learning (ML) is becoming ubiquitous in everyday life. Deep Learning
(DL) is already present in many applications ranging from computer vision for medicine to …

Confuciux: Autonomous hardware resource assignment for dnn accelerators using reinforcement learning

SC Kao, G Jeong, T Krishna - 2020 53rd Annual IEEE/ACM …, 2020 - ieeexplore.ieee.org
DNN accelerators provide efficiency by leveraging reuse of activations/weights/outputs
during the DNN computations to reduce data movement from DRAM to the chip. The reuse is …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …

A multi-neural network acceleration architecture

E Baek, D Kwon, J Kim - 2020 ACM/IEEE 47th Annual …, 2020 - ieeexplore.ieee.org
A cost-effective multi-tenant neural network execution is becoming one of the most important
design goals for modern neural network accelerators. For example, as emerging AI services …

Procrustes: a dataflow and accelerator for sparse deep neural network training

D Yang, A Ghasemazar, X Ren, M Golub… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
The success of DNN pruning has led to the development of energy-efficient inference
accelerators that support pruned models with sparse weight and activation tensors. Because …

Laconic deep learning inference acceleration

S Sharify, AD Lascorz, M Mahmoud, M Nikolic… - Proceedings of the 46th …, 2019 - dl.acm.org
We present a method for transparently identifying ineffectual computations during inference
with Deep Learning models. Specifically, by decomposing multiplications down to the bit …

[PDF][PDF] Gemmini: An agile systolic array generator enabling systematic evaluations of deep-learning architectures

H Genc, A Haj-Ali, V Iyer, A Amid, H Mao… - arxiv preprint arxiv …, 2019 - alonamid.github.io
Advances in deep learning and neural networks have resulted in rapid development of
hardware accelerators that support them. A large majority of ASIC accelerators, however …

FlexCNN: An end-to-end framework for composing CNN accelerators on FPGA

S Basalama, A Sohrabizadeh, J Wang, L Guo… - ACM Transactions on …, 2023 - dl.acm.org
With reduced data reuse and parallelism, recent convolutional neural networks (CNNs)
create new challenges for FPGA acceleration. Systolic arrays (SAs) are efficient, scalable …

Review and benchmarking of precision-scalable multiply-accumulate unit architectures for embedded neural-network processing

V Camus, L Mei, C Enz… - IEEE Journal on Emerging …, 2019 - ieeexplore.ieee.org
The current trend for deep learning has come with an enormous computational need for
billions of Multiply-Accumulate (MAC) operations per inference. Fortunately, reduced …

Dmazerunner: Executing perfectly nested loops on dataflow accelerators

S Dave, Y Kim, S Avancha, K Lee… - ACM Transactions on …, 2019 - dl.acm.org
Dataflow accelerators feature simplicity, programmability, and energy-efficiency and are
visualized as a promising architecture for accelerating perfectly nested loops that dominate …