Transformations of high-level synthesis codes for high-performance computing
Spatial computing architectures promise a major stride in performance and energy efficiency
over the traditional load/store devices currently employed in large scale computing systems …
over the traditional load/store devices currently employed in large scale computing systems …
Scalable multi-FPGA design of a discontinuous Galerkin shallow-water model on unstructured meshes
FPGAs are fostering interest as energy-efficient accelerators for scientific simulations,
including for methods operating on unstructured meshes. Considering the potential impact …
including for methods operating on unstructured meshes. Considering the potential impact …
[PDF][PDF] Exastencils: advanced multigrid solver generation
Present-day stencil codes are implemented in general-purpose programming languages,
such as Fortran, C, or Java, Python or derivates thereof, and harnesses for parallelism, such …
such as Fortran, C, or Java, Python or derivates thereof, and harnesses for parallelism, such …
Shallow water DG simulations on FPGAs: Design and comparison of a novel code generation pipeline
FPGAs are receiving increased attention as a promising architecture for accelerators in HPC
systems. Evolving and maturing development tools based on high-level synthesis promise …
systems. Evolving and maturing development tools based on high-level synthesis promise …
Performance evaluation of pipelined communication combined with computation in OpenCL programming on FPGA
In recent years, much High Performance Computing (HPC) researchers attract to utilize Field
Programmable Gate Arrays (FPGAs) for HPC applications. We can use FPGAs for …
Programmable Gate Arrays (FPGAs) for HPC applications. We can use FPGAs for …
Boyi: A systematic framework for automatically deciding the right execution model of OpenCL applications on FPGAs
FPGA vendors provide OpenCL software development kits for easier programmability, with
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …
the goal of replacing the time-consuming and error-prone register-transfer level (RTL) …
OpenCL implementation of Cannon's matrix multiplication algorithm on Intel Stratix 10 FPGAs
Stratix 10 FPGA cards have a good potential for the acceleration of HPC workloads since the
Stratix 10 product line introduces devices with a large number of DSP and memory blocks …
Stratix 10 product line introduces devices with a large number of DSP and memory blocks …
Algorithm-hardware co-design of a discontinuous Galerkin shallow-water model for a dataflow architecture on FPGA
We present the first FPGA implementation of the full simulation pipeline of a shallow water
code based on the discontinuous Galerkin method. Using OpenCL and following an …
code based on the discontinuous Galerkin method. Using OpenCL and following an …
High-performance spectral element methods on field-programmable gate arrays: implementation, evaluation, and future projection
Improvements in computer systems have historically relied on two well-known observations:
Moore's law and Dennard's scaling. Today, both these observations are ending, forcing …
Moore's law and Dennard's scaling. Today, both these observations are ending, forcing …
Parallel processing on FPGA combining computation and communication in OpenCL programming
In recent years, Field Programmable Gate Array (FPGA) has been a topic of interest in High
Performance Computing (HPC) research. Although the biggest problem in utilizing FPGAs …
Performance Computing (HPC) research. Although the biggest problem in utilizing FPGAs …