Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Analysis of OpenMP 4.5 offloading in implementations: correctness and overhead
The OpenMP language features have been evolving to meet the rapid development in
hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 …
hardware platforms. This journal focuses on evaluating implementations of OpenMP 4.5 …
ECP SOLLVE: validation and verification testsuite status update and compiler insight for openMP
The OpenMP language continues to evolve with every new specification release, as does
the need to validate and verify the new features that have been implemented by the different …
the need to validate and verify the new features that have been implemented by the different …
A Study of Performance Programming of CPU, GPU accelerated Computers and SIMD Architecture
X Yi - arxiv preprint arxiv:2409.10661, 2024 - arxiv.org
Parallel computing is a standard approach to achieving high-performance computing (HPC).
Three commonly used methods to implement parallel computing include: 1) applying …
Three commonly used methods to implement parallel computing include: 1) applying …
OpenMP target device offloading for the SX-Aurora TSUBASA vector engine
T Cramer, M Römmer, B Kosmynin, E Focht… - Parallel Processing and …, 2020 - Springer
Driven by the heterogeneity trend in modern supercomputers, OpenMP provides support for
heterogeneous systems since 2013. Having a single programming model for all kinds of …
heterogeneous systems since 2013. Having a single programming model for all kinds of …
Analyzing the Performance Portability of Tensor Decomposition
We employ pressure point analysis and roofline modeling to identify performance
bottlenecks and determine an upper bound on the performance of the Canonical Polyadic …
bottlenecks and determine an upper bound on the performance of the Canonical Polyadic …
OpenMP Offload Features and Strategies for High Performance across Architectures and Compilers
A Bhattacharjee, CS Daley… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
High performance accelerated computing has dawned in a new era of highly specialized
code that depends on the target architecture. All the latest pre-exascale and exascale class …
code that depends on the target architecture. All the latest pre-exascale and exascale class …
HeroSDK: Streamlining Heterogeneous RISC-V Accelerated Computing from Embedded to High-Performance Systems
Heterogeneous computing systems couple a general-purpose host processor with a single
or multiple domain-specific accelerators. Generally, embedded systems exploit …
or multiple domain-specific accelerators. Generally, embedded systems exploit …
Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks
Computing on heterogeneous architecture involving CPUs and accelerators is now a
popular choice of parallel computing. As a directive-based programming model, OpenMP …
popular choice of parallel computing. As a directive-based programming model, OpenMP …
Evaluating the performance of OpenMP offloading on the NEC SX-Aurora TSUBASA vector engine
T Cramer, B Kosmynin, S Moll, M Römmer… - Supercomputing …, 2021 - superfri.org
Abstract The NEC SX-Aurora TSUBASA vector engine (VE) follows the tradition of long
vector processors for high-performance computing (HPC). The technology combines the …
vector processors for high-performance computing (HPC). The technology combines the …