Programming and synthesis for software-defined FPGA acceleration: status and future prospects

YH Lai, E Ustun, S **ang, Z Fang, H Rong… - ACM Transactions on …, 2021 - dl.acm.org
FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …

AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture

J Zhuang, J Lau, H Ye, Z Yang, Y Du, J Lo… - Proceedings of the …, 2023 - dl.acm.org
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …

PolySA: Polyhedral-based systolic array auto-compilation

J Cong, J Wang - 2018 IEEE/ACM International Conference on …, 2018 - ieeexplore.ieee.org
Automatic systolic array generation has long been an interesting topic due to the need to
reduce the lengthy development cycles of manual designs. Existing automatic systolic array …

SODA: Stencil with optimized dataflow architecture

Y Chi, J Cong, P Wei, P Zhou - 2018 IEEE/ACM International …, 2018 - ieeexplore.ieee.org
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …

AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs

L Guo, Y Chi, J Wang, J Lau, W Qiao, E Ustun… - The 2021 ACM/SIGDA …, 2021 - dl.acm.org
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity
advantages, there remains a significant gap in the achievable clock frequency between an …

RapidStream: parallel physical implementation of FPGA HLS designs

L Guo, P Maidee, Y Zhou, C Lavin, J Wang… - Proceedings of the …, 2022 - dl.acm.org
FPGAs require a much longer compilation cycle than conventional computing platforms like
CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS …

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

Extending high-level synthesis for task-parallel programs

Y Chi, L Guo, J Lau, Y Choi, J Wang… - 2021 IEEE 29th Annual …, 2021 - ieeexplore.ieee.org
C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-
programmable gate array (FPGA) accelerators in many application domains in recent years …

CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture

J Zhuang, J Lau, H Ye, Z Yang, S Ji, J Lo… - ACM Transactions on …, 2024 - dl.acm.org
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …