Hpx-the c++ standard library for parallelism and concurrency

H Kaiser, P Diehl, AS Lemoine, BA Lelbach… - Journal of Open …, 2020 - joss.theoj.org
The new challenges presented by exascale system architectures have resulted in difficulty
achieving the desired scalability using traditional distributed-memory runtimes …

Stellar mergers with hpx-kokkos and sycl: Methods of using an asynchronous many-task runtime system with sycl

G Daiß, P Diehl, H Kaiser, D Pflüger - Proceedings of the 2023 …, 2023 - dl.acm.org
Ranging from NVIDIA GPUs to AMD GPUs and Intel GPUs: Given the heterogeneity of
available accelerator cards within current supercomputers, portability is a key aspect for …

An asynchronous and task-based implementation of peridynamics utilizing hpx—the c++ standard library for parallelism and concurrency

P Diehl, PK Jha, H Kaiser, R Lipton, M Lévesque - SN Applied Sciences, 2020 - Springer
On modern supercomputers, asynchronous many task systems are emerging to address the
new architecture of computational nodes. Through this shift of increasing cores per node, a …

Simulating stellar merger using HPX/Kokkos on A64FX on Supercomputer Fugaku

P Diehl, G Daiß, K Huck, D Marcello, S Shiber… - The Journal of …, 2024 - Springer
The increasing availability of machines relying on non-GPU architectures, such as ARM
A64FX in high-performance computing, provides a set of interesting challenges to …

Callback-based completion notification using MPI Continuations

J Schuchart, P Samfass, C Niethammer, J Gracia… - Parallel Computing, 2021 - Elsevier
Asynchronous programming models (APM) are gaining more and more traction, allowing
applications to expose the available concurrency to a runtime system tasked with …

Beyond fork-join: Integration of performance portable Kokkos kernels with HPX

G Daiß, M Simberg, A Reverdell… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Between a widening range of GPU vendors and the trend of having more GPUs per compute
node in supercomputers such as Summit, Perlmutter, Frontier and Aurora, develo** …

From task-based gpu work aggregation to stellar mergers: Turning fine-grained cpu tasks into portable gpu kernels

G Daiß, P Diehl, D Marcello… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Meeting both scalability and performance portability requirements is a challenge for any
HPC application, especially for adaptively refined ones. In Octo-Tiger, an astrophysics …

From merging frameworks to merging stars: Experiences using hpx, kokkos and simd types

G Daiß, SY Singanaboina, P Diehl… - 2022 IEEE/ACM 7th …, 2022 - ieeexplore.ieee.org
Octo-Tiger, a large-scale 3D AMR code for the merger of stars, uses a combination of HPX,
Kokkos and explicit SIMD types, aiming to achieve performance-portability for a broad range …

Asynchronous-Many-Task Systems: Challenges and Opportunities--Scaling an AMR Astrophysics Code on Exascale machines using Kokkos and HPX

G Daiß, P Diehl, J Yan, JK Holmen, R Gayatri… - arxiv preprint arxiv …, 2024 - arxiv.org
Dynamic and adaptive mesh refinement is pivotal in high-resolution, multi-physics, multi-
model simulations, necessitating precise physics resolution in localized areas across …

Broad performance measurement support for asynchronous multi-tasking with apex

KA Huck - 2022 IEEE/ACM 7th International Workshop on …, 2022 - ieeexplore.ieee.org
APEX (Autonomic Performance Environment for eXascale) is a performance measurement
library for distributed, asynchronous multitasking runtime systems. It provides support for …