[HTML][HTML] A CUDA-based GPU engine for gprMax: Open source FDTD electromagnetic simulation software

C Warren, A Giannopoulos, A Gray, I Giannakis… - Computer Physics …, 2019 - Elsevier
Abstract The Finite-Difference Time-Domain (FDTD) method is a popular numerical
modelling technique in computational electromagnetics. The volumetric nature of the FDTD …

Near-global climate simulation at 1 km resolution: establishing a performance baseline on 4888 GPUs with COSMO 5.0

O Fuhrer, T Chadha, T Hoefler… - Geoscientific Model …, 2018 - gmd.copernicus.org
The best hope for reducing long-standing global climate model biases is by increasing
resolution to the kilometer scale. Here we present results from an ultrahigh-resolution non …

Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing

H Anzt, T Cojean, G Flegar, F Göbel… - ACM Transactions on …, 2022 - dl.acm.org
In this article, we present Ginkgo, a modern C++ math library for scientific high performance
computing. While classical linear algebra libraries act on matrix and vector objects, Ginkgo's …

Implications of a metric for performance portability

SJ Pennycook, JD Sewall, VW Lee - Future Generation Computer Systems, 2019 - Elsevier
The term “performance portability” has been informally used in computing to refer to a variety
of notions which generally include:(1) the ability to run one application across multiple …

Evaluating attainable memory bandwidth of parallel programming models via BabelStream

T Deakin, J Price, M Martineau… - International Journal …, 2018 - inderscienceonline.com
Many scientific codes consist of memory bandwidth bound kernels. One major advantage of
many-core devices such as general purpose graphics processing units (GPGPUs) and the …

In-depth analyses of unified virtual memory system for GPU accelerated computing

T Allen, R Ge - Proceedings of the International Conference for High …, 2021 - dl.acm.org
The abstraction of a shared memory space over separate CPU and GPU memory domains
has eased the burden of portability for many HPC codebases. However, users pay for the …

Exploring the possibility of a hipSYCL-based implementation of oneAPI

A Alpay, B Soproni, H Wünsche… - Proceedings of the 10th …, 2022 - dl.acm.org
oneAPI is an open standard for a software platform built around SYCL 2020 and accelerated
libraries such as oneMKL as well as low-level building blocks such as oneAPI Level Zero …

AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020 - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

Performance analysis of CUDA, OpenACC and OpenMP programming models on TESLA V100 GPU

M Khalilov, A Timoveev - Journal of Physics: Conference Series, 2021 - iopscience.iop.org
Graphics processors are widely utilized in modern supercomputers as accelerators. Ability to
perform efficient parallelization and low-level allow scientists to greatly boost performance of …

A metric for performance portability

SJ Pennycook, JD Sewall, VW Lee - arxiv preprint arxiv:1611.07409, 2016 - arxiv.org
The term" performance portability" has been informally used in computing to refer to a variety
of notions which generally include: 1) the ability to run one application across multiple …