A survey of MPI usage in the US exascale computing project
Summary The Exascale Computing Project (ECP) is currently the primary effort in the United
States focused on develo** “exascale” levels of computing capabilities, including …
States focused on develo** “exascale” levels of computing capabilities, including …
Finepoints: Partitioned multithreaded MPI communication
The MPI multithreading model has been historically difficult to optimize; the interface that it
provides for threads was designed as a process-level interface. This model has led to …
provides for threads was designed as a process-level interface. This model has led to …
MPIX Stream: An explicit solution to hybrid MPI+ X programming
The hybrid MPI+ X programming paradigm, where X refers to threads or GPUs, has gained
prominence in the high-performance computing arena. This corresponds to a trend of …
prominence in the high-performance computing arena. This corresponds to a trend of …
Implementation and evaluation of MPI 4.0 partitioned communication libraries
Partitioned point-to-point communication primitives provide a performance-oriented
mechanism to support a hybrid parallel programming model and have been included in the …
mechanism to support a hybrid parallel programming model and have been included in the …
Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints
Modern high-speed interconnection networks are designed with capabilities to support
communication from multiple processor cores. The MPI endpoints extension has been …
communication from multiple processor cores. The MPI endpoints extension has been …
Give MPI threading a fair chance: A study of multithreaded MPI designs
The Message Passing Interface (MPI) has been one of the most prominent programming
paradigms in high-performance computing (HPC) for the past decade. Lately, with changes …
paradigms in high-performance computing (HPC) for the past decade. Lately, with changes …
Exascale machines require new programming paradigms and runtimes
Extreme scale parallel computing systems will have tens of thousands of optionally
accelerator-equiped nodes with hundreds of cores each, as well as deep memory …
accelerator-equiped nodes with hundreds of cores each, as well as deep memory …
Improving MPI multi-threaded RMA communication performance
One-sided communication is crucial to enabling communication concurrency. As core counts
have increased, particularly with many-core architectures, one-sided (RMA) communication …
have increased, particularly with many-core architectures, one-sided (RMA) communication …
How I learned to stop worrying about user-visible endpoints and love MPI
MPI+ threads is gaining prominence as an alternative to the traditional" MPI everywhere"
model in order to better handle the disproportionate increase in the number of cores …
model in order to better handle the disproportionate increase in the number of cores …
Partitioned collective communication
Partitioned point-to-point communication and persistent collective communication were both
recently standardized in MPI-4.0. Each offers performance and scalability advantages over …
recently standardized in MPI-4.0. Each offers performance and scalability advantages over …