Programming and synthesis for software-defined FPGA acceleration: status and future prospects
FPGA-based accelerators are increasingly popular across a broad range of applications,
because they offer massive parallelism, high energy efficiency, and great flexibility for …
because they offer massive parallelism, high energy efficiency, and great flexibility for …
AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …
notoriously challenging to customize an efficient systolic array processor for a target …
CHARM: C omposing H eterogeneous A ccele R ators for M atrix Multiply on Versal ACAP Architecture
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …
applications. To cope with the high computation demands of these applications …
PolySA: Polyhedral-based systolic array auto-compilation
J Cong, J Wang - 2018 IEEE/ACM International Conference on …, 2018 - ieeexplore.ieee.org
Automatic systolic array generation has long been an interesting topic due to the need to
reduce the lengthy development cycles of manual designs. Existing automatic systolic array …
reduce the lengthy development cycles of manual designs. Existing automatic systolic array …
SODA: Stencil with optimized dataflow architecture
Stencil computation is one of the most important kernels in many application domains such
as image processing, solving partial differential equations, and cellular automata. Many of …
as image processing, solving partial differential equations, and cellular automata. Many of …
AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity
advantages, there remains a significant gap in the achievable clock frequency between an …
advantages, there remains a significant gap in the achievable clock frequency between an …
RapidStream: parallel physical implementation of FPGA HLS designs
FPGAs require a much longer compilation cycle than conventional computing platforms like
CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS …
CPUs. In this paper, we shorten the overall compilation time by co-optimizing the HLS …
Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …
applications including scientific computing, graph processing, and deep learning …
Extending high-level synthesis for task-parallel programs
C/C++/OpenCL-based high-level synthesis (HLS) becomes more and more popular for field-
programmable gate array (FPGA) accelerators in many application domains in recent years …
programmable gate array (FPGA) accelerators in many application domains in recent years …
CHARM 2.0: Composing Heterogeneous Accelerators for Deep Learning on Versal ACAP Architecture
Dense matrix multiply (MM) serves as one of the most heavily used kernels in deep learning
applications. To cope with the high computation demands of these applications …
applications. To cope with the high computation demands of these applications …