[HTML][HTML] Model-based selection of optimal MPI broadcast algorithms for multi-core clusters

E Nuriyev, JA Rico-Gallego, A Lastovetsky - Journal of Parallel and …, 2022 - Elsevier
The performance of collective communication operations determines the overall
performance of MPI applications. Different algorithms have been developed and …

Mcr-dl: Mix-and-match communication runtime for deep learning

Q Anthony, AA Awan, J Rasley, Y He… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
In recent years, the training requirements of many state-of-the-art Deep Learning (DL)
models have scaled beyond the compute and memory capabilities of a single processor …

Generalized collective algorithms for the exascale era

M Wilkins, H Wang, P Liu, B Pham… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Exascale supercomputers have renewed the exigence of improving distributed
communication, specifically MPI collectives. Previous works accelerated collectives for …

Analytic modeling of idle waves in parallel programs: communication, cluster topology, and noise impact

A Afzal, G Hager, G Wellein - … , ISC High Performance 2021, Virtual Event …, 2021 - Springer
Most distributed-memory bulk-synchronous parallel programs in HPC assume that compute
resources are available continuously and homogeneously across the allocated set of …

ACCLAiM: Advancing the practicality of MPI collective communication autotuning using machine learning

M Wilkins, Y Guo, R Thakur, P Dinda… - … on Cluster Computing …, 2022 - ieeexplore.ieee.org
MPI collective communication is an omnipresent communication model for high-
performance computing (HPC) systems. The performance of a collective operation depends …

Benchmarking Julia's communication performance: Is Julia HPC ready or Full HPC?

S Hunold, S Steiner - 2020 IEEE/ACM Performance Modeling …, 2020 - ieeexplore.ieee.org
Julia has quickly become one of the main programming languages for computational
sciences, mainly due to its speed and flexibility. The speed and efficiency of Julia are the …

Efficient and accurate selection of optimal collective communication algorithms using analytical performance modeling

E Nuriyev, A Lastovetsky - IEEE Access, 2021 - ieeexplore.ieee.org
The performance of collective operations has been a critical issue since the advent of
Message Passing Interface (MPI). Many algorithms have been proposed for each MPI …

A FACT-based approach: Making machine learning collective autotuning feasible on exascale systems

M Wilkins, Y Guo, R Thakur… - 2021 Workshop on …, 2021 - ieeexplore.ieee.org
According to recent performance analyses, MPI collective operations make up a quarter of
the execution time on production systems. Machine learning (ML) autotuners use supervised …

OMPICollTune: Autotuning MPI collectives by incremental online learning

S Hunold, S Steiner - 2022 IEEE/ACM International Workshop …, 2022 - ieeexplore.ieee.org
Collective communication operations, such as Broadcast or Reduce, are fundamental
cornerstones in many high-performance applications. Most collective operations can be …

Algorithm selection of MPI collectives considering system utilization

M Salimi Beni, S Hunold, B Cosenza - European Conference on Parallel …, 2023 - Springer
MPI collective communications play an important role in coordinating and exchanging data
among parallel processes in high performance computing. Various algorithms exist for …