{FpgaNIC}: An {FPGA-based} versatile 100gb {SmartNIC} for {GPUs}
Given that the increasing rate of network bandwidth is far ahead of that of the compute
capacity of host CPU, which by default processes network packets, SmartNIC has been …
capacity of host CPU, which by default processes network packets, SmartNIC has been …
Sextans: A streaming accelerator for general-purpose sparse-matrix dense-matrix multiplication
Sparse-Matrix Dense-Matrix multiplication (SpMM) is the key operator for a wide range of
applications including scientific computing, graph processing, and deep learning …
applications including scientific computing, graph processing, and deep learning …
ReGraph: Scaling graph processing on HBM-enabled FPGAs with heterogeneous pipelines
The use of FPGAs for efficient graph processing has attracted significant interest. Recent
memory subsystem upgrades including the introduction of HBM in FPGAs promise to further …
memory subsystem upgrades including the introduction of HBM in FPGAs promise to further …
Unleashing network/accelerator co-exploration potential on fpgas: A deeper joint search
Recently, algorithm-hardware (HW) co-exploration for neural networks (NNs) has become
the key to obtaining high-quality solutions. However, previous efforts for field-programmable …
the key to obtaining high-quality solutions. However, previous efforts for field-programmable …
Near-memory computing on fpgas with 3d-stacked memories: Applications, architectures, and optimizations
The near-memory computing (NMC) paradigm has transpired as a promising method for
overcoming the memory wall challenges of future computing architectures. Modern systems …
overcoming the memory wall challenges of future computing architectures. Modern systems …
Automatic creation of high-bandwidth memory architectures from domain-specific languages: The case of computational fluid dynamics
Numerical simulations can help solve complex problems. Most of these algorithms are
massively parallel and thus good candidates for FPGA acceleration thanks to spatial …
massively parallel and thus good candidates for FPGA acceleration thanks to spatial …
MiCache: An MSHR-inclusive Non-blocking Cache Design for FPGAs
S Xu, S Lu, Z Shao, X Liao, H ** - Proceedings of the 2024 ACM/SIGDA …, 2024 - dl.acm.org
On FPGAs, customizing data parallelism can significantly improve performances of
applications. However, a large number of applications, such as sparse matrix multiplication …
applications. However, a large number of applications, such as sparse matrix multiplication …
ThunderGP: Resource-efficient graph processing framework on FPGAs with HLS
FPGA has been an emerging computing infrastructure in datacenters benefiting from fine-
grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing …
grained parallelism, energy efficiency, and reconfigurability. Meanwhile, graph processing …
Exploiting HBM on FPGAs for data processing
Field Programmable Gate Arrays (FPGAs) are increasingly being used in data centers and
the cloud due to their potential to accelerate certain workloads as well as for their …
the cloud due to their potential to accelerate certain workloads as well as for their …
ScalaBFS2: A High-performance BFS Accelerator on an HBM-enhanced FPGA Chip
K Li, S Xu, Z Shao, R Zheng, X Liao, H ** - ACM Transactions on …, 2024 - dl.acm.org
The introduction of High Bandwidth Memory (HBM) to the FPGA chip makes it possible for
an FPGA-based accelerator to leverage the huge memory bandwidth of HBM to improve its …
an FPGA-based accelerator to leverage the huge memory bandwidth of HBM to improve its …