The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Task bench: A parameterized benchmark for evaluating parallel runtime performance

E Slaughter, W Wu, Y Fu, L Brandenburg… - … Conference for High …, 2020 - ieeexplore.ieee.org
We present Task Bench, a parameterized benchmark designed to explore the performance
of distributed programming systems under a variety of application scenarios. Task Bench …

Benchmarking fortran DO CONCURRENT on cpus and gpus using babelstream

JR Hammond, T Deakin, J Cownie… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Fortran DO CONCURRENT has emerged as a new way to achieve parallel execution of
loops on CPUs and GPUs. This paper studies the performance portability of this construct on …

Benchmarking the Parallel 1D Heat Equation Solver in Chapel, Charm++, C++, HPX, Go, Julia, Python, Rust, Swift, and Java

P Diehl, M Morris, SR Brandt, N Gupta… - European Conference on …, 2023 - Springer
Many scientific high performance codes that simulate eg black holes, coastal waves, climate
and weather, etc. rely on block-structured meshes and use finite differencing methods to …

Control Replication: Compiling implicit parallelism to efficient SPMD with logical regions

E Slaughter, W Lee, S Treichler, W Zhang… - Proceedings of the …, 2017 - dl.acm.org
We present control replication, a technique for generating high-performance and scalable
SPMD code from implicitly parallel programs. In contrast to traditional parallel programming …

Quantifying Overheads in Charm++ and HPX Using Task Bench

N Wu, I Gonidelis, S Liu, Z Fink, N Gupta… - … Conference on Parallel …, 2022 - Springer
Abstract Asynchronous Many-Task (AMT) runtime systems take advantage of multi-core
architectures with light-weight threads, asynchronous executions, and smart scheduling. In …

FlipBack: automatic targeted protection against silent data corruption

X Ni, LV Kale - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
The decreasing size of transistors has been critical to the increase in capacity of
supercomputers. The smaller the transistors are, less energy is required to flip a bit, and thus …

LAPPS: Locality-aware productive prefetching support for PGAS

E Kayraklioglu, MP Ferguson… - ACM Transactions on …, 2018 - dl.acm.org
Prefetching is a well-known technique to mitigate scalability challenges in the Partitioned
Global Address Space (PGAS) model. It has been studied as either an automated compiler …

What quantum can learn from classical computer engineering

AY Matsuura, T Mattson - ACM Transactions on Quantum Computing, 2025 - dl.acm.org
Quantum computing represents a paradigm shift requiring reconceptualization of algorithms,
architectures, and software. Although much is new, there is much that quantum computing …

Evaluating data parallelism in c++ using the parallel research kernels

JR Hammond, TG Mattson - … of the International Workshop on OpenCL, 2019 - dl.acm.org
Evaluating data parallelism in C++ using the Parallel Research Kernels Page 1 Evaluating
data parallelism in C++ using the Parallel Research Kernels Jeff R. Hammond jeff.r.hammond@intel.com …