AutoSA: A polyhedral compiler for high-performance systolic arrays on FPGA

J Wang, L Guo, J Cong - The 2021 ACM/SIGDA International Symposium …, 2021 - dl.acm.org
While systolic array architectures have the potential to deliver tremendous performance, it is
notoriously challenging to customize an efficient systolic array processor for a target …

AutoBridge: Coupling coarse-grained floorplanning and pipelining for high-frequency HLS design on multi-die FPGAs

L Guo, Y Chi, J Wang, J Lau, W Qiao, E Ustun… - The 2021 ACM/SIGDA …, 2021 - dl.acm.org
Despite an increasing adoption of high-level synthesis (HLS) for its design productivity
advantages, there remains a significant gap in the achievable clock frequency between an …

Hbm connect: High-performance hls interconnect for fpga hbm

Y Choi, Y Chi, W Qiao, N Samardzic… - The 2021 ACM/SIGDA …, 2021 - dl.acm.org
With the recent release of High Bandwidth Memory (HBM) based FPGA boards, developers
can now exploit unprecedented external memory bandwidth. This allows more memory …

Gme: Gpu-based microarchitectural extensions to accelerate homomorphic encryption

K Shivdikar, Y Bao, R Agrawal, M Shen… - Proceedings of the 56th …, 2023 - dl.acm.org
Fully Homomorphic Encryption (FHE) enables the processing of encrypted data without
decrypting it. FHE has garnered significant attention over the past decade as it supports …

Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication

L Song, Y Chi, A Sohrabizadeh, Y Choi, J Lau… - Proceedings of the …, 2022 - dl.acm.org
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …

Fleetrec: Large-scale recommendation inference on hybrid gpu-fpga clusters

W Jiang, Z He, S Zhang, K Zeng, L Feng… - Proceedings of the 27th …, 2021 - dl.acm.org
We present FleetRec, a high-performance and scalable recommendation inference system
within tight latency constraints. FleetRec takes advantage of heterogeneous hardware …

Accelerating SSSP for power-law graphs

Y Chi, L Guo, J Cong - Proceedings of the 2022 ACM/SIGDA …, 2022 - dl.acm.org
The single-source shortest path (SSSP) problem is one of the most important and well-
studied graph problems widely used in many application domains, such as road navigation …

Shuhai: A tool for benchmarking high bandwidth memory on FPGAs

H Huang, Z Wang, J Zhang, Z He, C Wu… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
FPGAs are starting to incorporate High Bandwidth Memory (HBM) to both reduce the
memory bandwidth bottleneck encountered in some applications and to provide more …

Automatic creation of high-bandwidth memory architectures from domain-specific languages: The case of computational fluid dynamics

S Soldavini, K Friebel, M Tibaldi, G Hempel… - ACM Transactions on …, 2023 - dl.acm.org
Numerical simulations can help solve complex problems. Most of these algorithms are
massively parallel and thus good candidates for FPGA acceleration thanks to spatial …

A survey of FPGA optimization methods for data center energy efficiency

M Tibaldi, C Pilato - IEEE Transactions on Sustainable …, 2023 - ieeexplore.ieee.org
This article provides a survey of academic literature about field programmable gate array
(FPGA) and their utilization for energy efficiency acceleration in data centers. The goal is to …