Domain-specific multi-level IR rewriting for GPU: The Open Earth compiler for GPU-accelerated climate simulation

T Gysi, C Müller, O Zinenko, S Herhut, E Davis… - ACM Transactions on …, 2021 - dl.acm.org
Most compilers have a single core intermediate representation (IR)(eg, LLVM) sometimes
complemented with vaguely defined IR-like data structures. This IR is commonly low-level …

AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020 - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

[HTML][HTML] Efficient simulation execution of cellular automata on GPU

D Cagigas-Muñiz, F Diaz-del-Rio… - … Modelling Practice and …, 2022 - Elsevier
Abstract Graphics Processing Units (GPUs) can be used as convenient hardware
accelerators to speed up Cellular Automata (CA) simulations, which are employed in many …

Toward accelerated stencil computation by adapting tensor core unit on gpu

X Liu, Y Liu, H Yang, J Liao, M Li, Z Luan… - Proceedings of the 36th …, 2022 - dl.acm.org
The Tensor Core Unit (TCU) has been increasingly adopted on modern high performance
processors, specialized in boosting the performance of general matrix multiplication …

Scalable Tuning of (OpenMP) GPU Applications via Kernel Record and Replay

K Parasyris, G Georgakoudis, E Rangel… - Proceedings of the …, 2023 - dl.acm.org
HPC is a heterogeneous world in which host and device code are interleaved throughout
the application. Given the significant performance advantage of accelerators, device code …

A μ-mode integrator for solving evolution equations in Kronecker form

M Caliari, F Cassini, L Einkemmer, A Ostermann… - Journal of …, 2022 - Elsevier
In this paper, we propose a μ-mode integrator for computing the solution of stiff evolution
equations. The integrator is based on a d-dimensional splitting approach and uses exact …

On optimizing complex stencils on GPUs

PS Rawat, M Vaidya… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Stencil computations are often the compute-intensive kernel in many scientific applications.
With the increasing demand for computational accuracy, and the emergence of massively …

A versatile software systolic execution model for GPU memory-bound kernels

P Chen, M Wahib, S Takizawa, R Takano… - Proceedings of the …, 2019 - dl.acm.org
This paper proposes a versatile high-performance execution model, inspired by systolic
arrays, for memory-bound regular kernels running on CUDA-enabled GPUs. We formulate a …

Automated code generation of high-order stencils for a dataflow architecture

R Sai, J Mellor-Crummey, J Xu… - … Conference for High …, 2024 - ieeexplore.ieee.org
Finite-difference methods based on high-order stencils are widely used in seismic
simulations, weather forecasting, and computational fluid dynamics. Recently, multiple …

Accelerating high-order stencils on GPUs

R Sai, J Mellor-Crummey, X Meng… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org
While implementation strategies for low-order stencils on GPUs have been well-studied in
the literature, not all of the techniques work well for high-order stencils, such as those used …