The OpenMP cluster programming model

H Yviquel, M Pereira, E Francesquini… - … Proceedings of the …, 2022 - dl.acm.org
Despite the various research initiatives and proposed programming models, efficient
solutions for parallel programming in HPC clusters still rely on a complex combination of …

Remote openmp offloading

A Patel, J Doerfert - Proceedings of the 27th ACM SIGPLAN Symposium …, 2022 - dl.acm.org
OpenMP has a long and successful history in parallel programming for CPUs, and more
recently GPUs through accelerator offloading. In this work we show that the OpenMP …

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

P Czarnul - Concurrency and Computation: Practice and …, 2023 - Wiley Online Library
In the article, we have proposed a framework that allows programming a parallel application
for a multi‐node system, with one or more graphical processing units (GPUs) per node …

MPI-based Remote OpenMP Offloading: A More Efficient and Easy-to-use Implementation

B Shan, M Araya-Polo, AM Malik… - Proceedings of the 14th …, 2023 - dl.acm.org
MPI+ X is the most popular hybrid programming model for distributed computation on
modern heterogeneous HPC systems. Nonetheless, for simplicity, HPC developers ideally …

A gem5 implementation of the sequential codelet model: Reducing overhead and expanding the software memory interface

D Fox, JM Monsalve Diaz, X Li - Proceedings of the SC'23 Workshops of …, 2023 - dl.acm.org
Modern tasking models define applications in a fine-grained manner that necessitates lower
overhead per segment of computation. Fine-grained tasks, if done right, enable higher …

On Memory Codelets: Prefetching, Recoding, Moving and Streaming Data

D Fox, JM Diaz, X Li - arxiv preprint arxiv:2302.00115, 2023 - arxiv.org
For decades, memory capabilities have scaled up much slower than compute capabilities,
leaving memory utilization as a major bottleneck. Prefetching and cache hierarchies mitigate …

libmpnode: An openmp runtime for parallel processing across incoherent domains

R Lyerly, SH Kim, B Ravindran - … of the 10th International Workshop on …, 2019 - dl.acm.org
In this work we describe libMPNode, an OpenMP runtime designed for efficient
multithreaded execution across systems composed of multiple non-cache-coherent …

Optimizing performance and energy efficiency in massively parallel systems

R Nozal - 2022 - repositorio.unican.es
Heterogeneous systems are becoming increasingly relevant, due to their performance and
energy efficiency capabilities, being present in all types of computing platforms, from …

FOTV: A generic device offloading framework for openmp

JL Vazquez, P Sanchez - … : Enabling Massive Node-Level Parallelism: 17th …, 2021 - Springer
Since the introduction of the “target” directive in the 4.0 specification, the usage of OpenMP
for heterogeneous computing programming has increased significantly. However, the …

[PDF][PDF] 2023 AI Testbed Expeditions Report

V Vishwanath, M Emani, V Sastry, W Arnold, R Thakur… - 2023 - osti.gov
The recent trend in computing toward deep learning has resulted in the development of a
variety of highly innovative AI accelerator architectures. One such architecture, the Cerebras …