A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Fact: Ffn-attention co-optimized transformer architecture with eager correlation prediction

Y Qin, Y Wang, D Deng, Z Zhao, X Yang, L Liu… - Proceedings of the 50th …, 2023 - dl.acm.org
Transformer model is becoming prevalent in various AI applications with its outstanding
performance. However, the high cost of computation and memory footprint make its …

Highlight: Efficient and flexible dnn acceleration with hierarchical structured sparsity

YN Wu, PA Tsai, S Muralidharan, A Parashar… - Proceedings of the 56th …, 2023 - dl.acm.org
Due to complex interactions among various deep neural network (DNN) optimization
techniques, modern DNNs can have weights and activations that are dense or sparse with …

Reconfigurability, why it matters in ai tasks processing: A survey of reconfigurable ai chips

S Wei, X Lin, F Tu, Y Wang, L Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Nowadays, artificial intelligence (AI) technologies, especially deep neural networks (DNNs),
play an vital role in solving many problems in both academia and industry. In order to …

ELSA: Exploiting layer-wise n: m sparsity for vision transformer acceleration

NC Huang, CC Chang, WC Lin… - Proceedings of the …, 2024 - openaccess.thecvf.com
N: M sparsity is an emerging model compression method supported by more and more
accelerators to speed up sparse matrix multiplication in deep neural networks. Most existing …

Vegeta: Vertically-integrated extensions for sparse/dense gemm tile acceleration on cpus

G Jeong, S Damani, AR Bambhaniya… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Deep Learning (DL) acceleration support in CPUs has recently gained a lot of traction, with
several companies (Arm, Intel, IBM) announcing products with specialized matrix engines …

RM-STC: Row-Merge Dataflow Inspired GPU Sparse Tensor Core for Energy-Efficient Sparse Acceleration

G Huang, Z Wang, PA Tsai, C Zhang, Y Ding… - Proceedings of the 56th …, 2023 - dl.acm.org
This paper proposes RM-STC, a novel GPU tensor core architecture designed for sparse
Deep Neural Networks (DNNs) with two key innovations:(1) native support for both training …

Automated HW/SW co-design for edge AI: State, challenges and steps ahead

O Bringmann, W Ecker, I Feldner… - Proceedings of the …, 2021 - dl.acm.org
Gigantic rates of data production in the era of Big Data, Internet of Thing (IoT), and Smart
Cyber Physical Systems (CPS) pose incessantly escalating demands for massive data …

PDP: parameter-free differentiable pruning is all you need

M Cho, S Adya, D Naik - Advances in Neural Information …, 2024 - proceedings.neurips.cc
DNN pruning is a popular way to reduce the size of a model, improve the inferencelatency,
and minimize the power consumption on DNN accelerators. However, existing approaches …

ETTE: Efficient tensor-train-based computing engine for deep neural networks

Y Gong, M Yin, L Huang, J **ao, Y Sui, C Deng… - Proceedings of the 50th …, 2023 - dl.acm.org
Tensor-train (TT) decomposition enables ultra-high compression ratio, making the deep
neural network (DNN) accelerators based on this method very attractive. TIE, the state-of-the …