The OpenMP cluster programming model
H Yviquel, M Pereira, E Francesquini… - … Proceedings of the …, 2022 - dl.acm.org
Despite the various research initiatives and proposed programming models, efficient
solutions for parallel programming in HPC clusters still rely on a complex combination of …
solutions for parallel programming in HPC clusters still rely on a complex combination of …
Remote openmp offloading
A Patel, J Doerfert - Proceedings of the 27th ACM SIGPLAN Symposium …, 2022 - dl.acm.org
OpenMP has a long and successful history in parallel programming for CPUs, and more
recently GPUs through accelerator offloading. In this work we show that the OpenMP …
recently GPUs through accelerator offloading. In this work we show that the OpenMP …
A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems
P Czarnul - Concurrency and Computation: Practice and …, 2023 - Wiley Online Library
In the article, we have proposed a framework that allows programming a parallel application
for a multi‐node system, with one or more graphical processing units (GPUs) per node …
for a multi‐node system, with one or more graphical processing units (GPUs) per node …
MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation
MPI+ X is the most popular hybrid programming model for distributed computation on
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …
A gem5 implementation of the sequential codelet model: Reducing overhead and expanding the software memory interface
D Fox, JM Monsalve Diaz, X Li - Proceedings of the SC'23 Workshops of …, 2023 - dl.acm.org
Modern tasking models define applications in a fine-grained manner that necessitates lower
overhead per segment of computation. Fine-grained tasks, if done right, enable higher …
overhead per segment of computation. Fine-grained tasks, if done right, enable higher …
On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data
For decades, memory capabilities have scaled up much slower than compute capabilities,
leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate …
leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate …
libmpnode: An openmp runtime for parallel processing across incoherent domains
In this work we describe libMPNode, an OpenMP runtime designed for efficient
multithreaded execution across systems composed of multiple non-cache-coherent …
multithreaded execution across systems composed of multiple non-cache-coherent …
Optimizing performance and energy efficiency in massively parallel systems
R Nozal - 2022 - repositorio.unican.es
Heterogeneous systems are becoming increasingly relevant, due to their performance and
energy efficiency capabilities, being present in all types of computing platforms, from …
energy efficiency capabilities, being present in all types of computing platforms, from …
FOTV: A generic device offloading framework for openmp
JL Vazquez, P Sanchez - … : Enabling Massive Node-Level Parallelism: 17th …, 2021 - Springer
Since the introduction of the “target” directive in the 4.0 specification, the usage of OpenMP
for heterogeneous computing programming has increased significantly. However, the …
for heterogeneous computing programming has increased significantly. However, the …
[PDF][PDF] 2023 AI Testbed Expeditions Report
The recent trend in computing toward deep learning has resulted in the development of a
variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras …
variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras …