Efficient acceleration of deep learning inference on resource-constrained edge devices: A review
Successful integration of deep neural networks (DNNs) or deep learning (DL) has resulted
in breakthroughs in many areas. However, deploying these highly accurate models for data …
in breakthroughs in many areas. However, deploying these highly accurate models for data …
FPGA HLS today: successes, challenges, and opportunities
The year 2011 marked an important transition for FPGA high-level synthesis (HLS), as it
went from prototy** to deployment. A decade later, in this article, we assess the progress …
went from prototy** to deployment. A decade later, in this article, we assess the progress …
[HTML][HTML] Heterogeneous parallelization and acceleration of molecular dynamics simulations in GROMACS
The introduction of accelerator devices such as graphics processing units (GPUs) has had
profound impact on molecular dynamics simulations and has enabled order-of-magnitude …
profound impact on molecular dynamics simulations and has enabled order-of-magnitude …
Matraptor: A sparse-sparse matrix multiplication accelerator based on row-wise product
Sparse-sparse matrix multiplication (SpGEMM) is a computation kernel widely used in
numerous application domains such as data analytics, graph processing, and scientific …
numerous application domains such as data analytics, graph processing, and scientific …
AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …
notoriously challenging to customize an efficient systolic array processor for a target …
A Survey of Design and Optimization for Systolic Array-based DNN Accelerators
In recent years, it has been witnessed that the systolic array is a successful architecture for
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
DNN hardware accelerators. However, the design of systolic arrays also encountered many …
Tensaurus: A versatile accelerator for mixed sparse-dense tensor computations
Tensor factorizations are powerful tools in many machine learning and data analytics
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …
applications. Tensors are often sparse, which makes sparse tensor factorizations memory …
Allo: A programming model for composable accelerator design
Special-purpose hardware accelerators are increasingly pivotal for sustaining performance
improvements in emerging applications, especially as the benefits of technology scaling …
improvements in emerging applications, especially as the benefits of technology scaling …
Sparseloop: An analytical approach to sparse tensor accelerator modeling
In recent years, many accelerators have been proposed to efficiently process sparse tensor
algebra applications (eg, sparse neural networks). However, these proposals are single …
algebra applications (eg, sparse neural networks). However, these proposals are single …
Hardware acceleration of sparse and irregular tensor computations of ml models: A survey and insights
Machine learning (ML) models are widely used in many important domains. For efficiently
processing these computational-and memory-intensive applications, tensors of these …
processing these computational-and memory-intensive applications, tensors of these …