Efficient acceleration of deep learning inference on resource-constrained edge devices: A review

MMH Shuvo, SK Islam, J Cheng… - Proceedings of the …, 2022 - ieeexplore.ieee.org
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …

FPGA HLS today: successes, challenges, and opportunities

J Cong, J Lau, G Liu, S Neuendorffer, P Pan… - ACM Transactions on …, 2022 - dl.acm.org
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototy** to deployment. A decade later, in this article, we assess the progress …

[HTML][HTML] Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS

S Páll, A Zhmurov, P Bauer, M Abraham… - The Journal of …, 2020 - pubs.aip.org
The introduction of accelerator devices such as graphics processing units (GPUs) has had
profound impact on molecular dynamics simulations and has enabled order-of-magnitude …

Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product

N Srivastava, H **, J Liu, D Albonesi… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Sparse-sparse matrix multiplication (SpGEMM) is a computation kernel widely used in
numerous application domains such as data analytics, graph processing, and scientific …

AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

A Survey of Design and Optimization for Systolic Array-based DNN Accelerators

R Xu, S Ma, Y Guo, D Li - ACM Computing Surveys, 2023 - dl.acm.org
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …

Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations

N Srivastava, H **, S Smith, H Rong… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …

Allo: A programming model for composable accelerator design

H Chen, N Zhang, S **ang, Z Zeng, M Dai… - Proceedings of the ACM …, 2024 - dl.acm.org
Special-purpose hardware accelerators are increasingly pivotal for sustaining performance
improvements in emerging applications, especially as the benefits of technology scaling …

Sparseloop: An analytical approach to sparse tensor accelerator modeling

YN Wu, PA Tsai, A Parashar, V Sze… - 2022 55th IEEE/ACM …, 2022 - ieeexplore.ieee.org
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …

Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights

S Dave, R Baghdadi, T Nowatzki… - Proceedings of the …, 2021 - ieeexplore.ieee.org
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …