Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Analysis of OpenMP 4.5 offloading in implementations: correctness and overhead

JM Diaz, K Friedline, S Pophale, O Hernandez… - Parallel Computing, 2019 - Elsevier
The OpenMP language features have been evolving to meet the rapid development in
hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 …

ECP SOLLVE: validation and verification testsuite status update and compiler insight for openMP

T Huber, S Pophale, N Baker, M Carr… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
The OpenMP language continues to evolve with every new specification release, as does
the need to validate and verify the new features that have been implemented by the different …

A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture

X Yi - arxiv preprint arxiv:2409.10661, 2024 - arxiv.org
Parallel computing is a standard approach to achieving high-performance computing (HPC).
Three commonly used methods to implement parallel computing include: 1) applying …

OpenMP target device offloading for the SX-Aurora TSUBASA vector engine

T Cramer, M Römmer, B Kosmynin, E Focht… - Parallel Processing and …, 2020 - Springer
Driven by the heterogeneity trend in modern supercomputers, OpenMP provides support for
heterogeneous systems since 2013. Having a single programming model for all kinds of …

Analyzing the Performance Portability of Tensor Decomposition

S Anderson, K Teranishi, DM Dunlavy… - arxiv preprint arxiv …, 2023 - arxiv.org
We employ pressure point analysis and roofline modeling to identify performance
bottlenecks and determine an upper bound on the performance of the Canonical Polyadic …

OpenMP Offload Features and Strategies for High Performance across Architectures and Compilers

A Bhattacharjee, CS Daley… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
High performance accelerated computing has dawned in a new era of highly specialized
code that depends on the target architecture. All the latest pre-exascale and exascale class …

HeroSDK: Streamlining Heterogeneous RISC-V Accelerated Computing from Embedded to High-Performance Systems

C Koenig, B Forsberg, L Benini - 2024 IEEE 42nd International …, 2024 - ieeexplore.ieee.org
Heterogeneous computing systems couple a general-purpose host processor with a single
or multiple domain-specific accelerators. Generally, embedded systems exploit …

Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks

K Yan, Y Shi, Y Yan - Proceedings of the 14th International Workshop on …, 2023 - dl.acm.org
Computing on heterogeneous architecture involving CPUs and accelerators is now a
popular choice of parallel computing. As a directive-based programming model, OpenMP …

Evaluating the performance of OpenMP offloading on the NEC SX-Aurora TSUBASA vector engine

T Cramer, B Kosmynin, S Moll, M Römmer… - Supercomputing …, 2021 - superfri.org
Abstract The NEC SX-Aurora TSUBASA vector engine (VE) follows the tradition of long
vector processors for high-performance computing (HPC). The technology combines the …