A survey of design and optimization for systolic array-based dnn accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Full stack optimization of transformer inference: a survey

S Kim, C Hooper, T Wattanawong, M Kang… - arxiv preprint arxiv …, 2023 - arxiv.org
Recent advances in state-of-the-art DNN architecture design have been moving toward
Transformer models. These models achieve superior accuracy across a wide range of …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

S2ta: Exploiting structured sparsity for energy-efficient mobile cnn acceleration

ZG Liu, PN Whatmough, Y Zhu… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Exploiting sparsity is a key technique in accelerating quantized convolutional neural network
(CNN) inference on mobile devices. Prior sparse CNN accelerators largely exploit …

Freely scalable and reconfigurable optical hardware for deep learning

L Bernstein, A Sludds, R Hamerly, V Sze, J Emer… - Scientific reports, 2021 - nature.com
As deep neural network (DNN) models grow ever-larger, they can achieve higher accuracy
and solve more complex problems. This trend has been enabled by an increase in available …

Llmcompass: Enabling efficient hardware design for large language model inference

H Zhang, A Ning, RB Prabhakar… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
The past year has witnessed the increasing popularity of Large Language Models (LLMs).
Their unprecedented scale and associated high hardware cost have impeded their broader …

Transform quantization for CNN compression

SI Young, W Zhe, D Taubman… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In this paper, we compress convolutional neural network (CNN) weights post-training via
transform quantization. Previous CNN quantization techniques tend to ignore the joint …

Automatic domain-specific soc design for autonomous unmanned aerial vehicles

S Krishnan, Z Wan, K Bhardwaj… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
Building domain-specific accelerators is becoming increasingly paramount to meet the high-
performance requirements under stringent power and real-time constraints. However …

Tileflow: A framework for modeling fusion dataflow via tree-based analysis

S Zheng, S Chen, S Gao, L Jia, G Sun… - Proceedings of the 56th …, 2023 - dl.acm.org
With the increasing size of DNN models and the growing discrepancy between compute
performance and memory bandwidth, fusing multiple layers together to reduce off-chip …

Technology prospects for data-intensive computing

K Akarvardar, HSP Wong - Proceedings of the IEEE, 2023 - ieeexplore.ieee.org
For many decades, progress in computing hardware has been closely associated with
CMOS logic density, performance, and cost. As such, slowdown in 2-D scaling, frequency …