NERO: A near high-bandwidth memory stencil accelerator for weather prediction modeling

G Singh, D Diamantopoulos… - … Conference on Field …, 2020 - ieeexplore.ieee.org
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …

Casper: Accelerating stencil computations using near-cache processing

A Denzler, GF Oliveira, N Ha**azar, R Bera… - IEEE …, 2023 - ieeexplore.ieee.org
Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …

Accelerating weather prediction using near-memory reconfigurable fabric

G Singh, D Diamantopoulos, J Gómez-Luna… - ACM Transactions on …, 2022 - dl.acm.org
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …

[PDF][PDF] Is fpga useful for hash joins?

X Chen, Y Chen, R Bajaj, J He, B He, WF Wong… - CIDR, 2020 - comp.nus.edu.sg
Benefiting from the fine-grained parallelism and energy efficiency, heterogeneous
computing platforms featuring FP-GAs are becoming more and more common in data …

Axonn: Energy-aware execution of neural network inference on multi-accelerator heterogeneous socs

I Dagli, A Cieslewicz, J McClurg… - Proceedings of the 59th …, 2022 - dl.acm.org
The energy and latency demands of critical workload execution, such as object detection, in
embedded systems vary based on the physical system state and other external factors …

HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures

G Gerogiannis, S Aananthakrishnan… - … Symposium on High …, 2024 - ieeexplore.ieee.org
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application
across a wide range of domains, including machine learning and linear algebra solvers. In …

Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs

J Jiang, Z Wang, X Liu, J Gómez-Luna… - Proceedings of the …, 2020 - dl.acm.org
FPGA vendors provide OpenCL software development kits for easier programmability, with
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …

Accelerating sparse deep neural networks on FPGAs

S Huang, C Pearson, R Nagi, J **ong… - 2019 IEEE High …, 2019 - ieeexplore.ieee.org
Deep neural networks (DNNs) have been widely adopted in many domains, including
computer vision, natural language processing, and medical care. Recent research reveals …

Resource-aware collaborative allocation for cpu-fpga cloud environments

MG Jordan, G Korol, MB Rutzig… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
Cloud Warehouses have been exploiting CPU-FPGA environments to accelerate multi-
tenant applications to achieve scalability and maximize resource utilization. In this scenario …

Mocha: Multinode cost optimization in heterogeneous clouds with accelerators

P Zhou, J Sheng, CH Yu, P Wei, J Wang, D Wu… - The 2021 ACM/SIGDA …, 2021 - dl.acm.org
FPGAs have been widely deployed in public clouds, eg, Amazon Web Services (AWS) and
Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe …