NERO: A near high-bandwidth memory stencil accelerator for weather prediction modeling
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
Casper: Accelerating stencil computations using near-cache processing
Stencil computations are commonly used in a wide variety of scientific applications, ranging
from large-scale weather prediction to solving partial differential equations. Stencil …
from large-scale weather prediction to solving partial differential equations. Stencil …
Accelerating weather prediction using near-memory reconfigurable fabric
Ongoing climate change calls for fast and accurate weather and climate modeling. However,
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
when solving large-scale weather prediction simulations, state-of-the-art CPU and GPU …
[PDF][PDF] Is fpga useful for hash joins?
Benefiting from the fine-grained parallelism and energy efficiency, heterogeneous
computing platforms featuring FP-GAs are becoming more and more common in data …
computing platforms featuring FP-GAs are becoming more and more common in data …
Axonn: Energy-aware execution of neural network inference on multi-accelerator heterogeneous socs
The energy and latency demands of critical workload execution, such as object detection, in
embedded systems vary based on the physical system state and other external factors …
embedded systems vary based on the physical system state and other external factors …
HotTiles: Accelerating SpMM with Heterogeneous Accelerator Architectures
Sparse Matrix Dense Matrix Multiplication (SpMM) is an important kernel with application
across a wide range of domains, including machine learning and linear algebra solvers. In …
across a wide range of domains, including machine learning and linear algebra solvers. In …
Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs
FPGA vendors provide OpenCL software development kits for easier programmability, with
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …
Accelerating sparse deep neural networks on FPGAs
Deep neural networks (DNNs) have been widely adopted in many domains, including
computer vision, natural language processing, and medical care. Recent research reveals …
computer vision, natural language processing, and medical care. Recent research reveals …
Resource-aware collaborative allocation for cpu-fpga cloud environments
Cloud Warehouses have been exploiting CPU-FPGA environments to accelerate multi-
tenant applications to achieve scalability and maximize resource utilization. In this scenario …
tenant applications to achieve scalability and maximize resource utilization. In this scenario …
Mocha: Multinode cost optimization in heterogeneous clouds with accelerators
FPGAs have been widely deployed in public clouds, eg, Amazon Web Services (AWS) and
Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe …
Huawei Cloud. However, simply offloading accelerated kernels from CPU hosts to PCIe …