[HTML][HTML] Model-based selection of optimal MPI broadcast algorithms for multi-core clusters
The performance of collective communication operations determines the overall
performance of MPI applications. Different algorithms have been developed and …
performance of MPI applications. Different algorithms have been developed and …
Mcr-dl: Mix-and-match communication runtime for deep learning
In recent years, the training requirements of many state-of-the-art Deep Learning (DL)
models have scaled beyond the compute and memory capabilities of a single processor …
models have scaled beyond the compute and memory capabilities of a single processor …
Generalized collective algorithms for the exascale era
Exascale supercomputers have renewed the exigence of improving distributed
communication, specifically MPI collectives. Previous works accelerated collectives for …
communication, specifically MPI collectives. Previous works accelerated collectives for …
Analytic modeling of idle waves in parallel programs: communication, cluster topology, and noise impact
Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute
resources are available continuously and homogeneously across the allocated set of …
resources are available continuously and homogeneously across the allocated set of …
ACCLAiM: Advancing the practicality of MPI collective communication autotuning using machine learning
MPI collective communication is an omnipresent communication model for high-
performance computing (HPC) systems. The performance of a collective operation depends …
performance computing (HPC) systems. The performance of a collective operation depends …
Benchmarking Julia's communication performance: Is Julia HPC ready or Full HPC?
S Hunold, S Steiner - 2020 IEEE/ACM Performance Modeling …, 2020 - ieeexplore.ieee.org
Julia has quickly become one of the main programming languages for computational
sciences, mainly due to its speed and flexibility. The speed and efficiency of Julia are the …
sciences, mainly due to its speed and flexibility. The speed and efficiency of Julia are the …
Efficient and accurate selection of optimal collective communication algorithms using analytical performance modeling
The performance of collective operations has been a critical issue since the advent of
Message Passing Interface (MPI). Many algorithms have been proposed for each MPI …
Message Passing Interface (MPI). Many algorithms have been proposed for each MPI …
A FACT-based approach: Making machine learning collective autotuning feasible on exascale systems
According to recent performance analyses, MPI collective operations make up a quarter of
the execution time on production systems. Machine learning (ML) autotuners use supervised …
the execution time on production systems. Machine learning (ML) autotuners use supervised …
OMPICollTune: Autotuning MPI collectives by incremental online learning
S Hunold, S Steiner - 2022 IEEE/ACM International Workshop …, 2022 - ieeexplore.ieee.org
Collective communication operations, such as Broadcast or Reduce, are fundamental
cornerstones in many high-performance applications. Most collective operations can be …
cornerstones in many high-performance applications. Most collective operations can be …
Algorithm selection of MPI collectives considering system utilization
MPI collective communications play an important role in coordinating and exchanging data
among parallel processes in high performance computing. Various algorithms exist for …
among parallel processes in high performance computing. Various algorithms exist for …