Unified x-space parallelization algorithm for conserved discrete unified gas kinetic scheme

Q Zhang, Y Wang, D Pan, J Chen, S Liu, C Zhuo… - Computer Physics …, 2022 - Elsevier
In this paper, the open source multiscale flow solver dugksFoam is optimized with a newly
proposed parallelization strategy and conserved algorithm. A novel X-space parallel …

Swing: Short-cutting rings for higher bandwidth allreduce

D De Sensi, T Bonato, D Saam, T Hoefler - 21st USENIX Symposium on …, 2024 - usenix.org
The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …

Generalized collective algorithms for the exascale era

M Wilkins, H Wang, P Liu, B Pham… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
Exascale supercomputers have renewed the exigence of improving distributed
communication, specifically MPI collectives. Previous works accelerated collectives for …

[HTML][HTML] An optimisation of allreduce communication in message-passing systems

A Jocksch, N Ohana, E Lanti, E Koutsaniti… - Parallel Computing, 2021 - Elsevier
Collective communication, namely the pattern allreduce in message-passing systems, is
optimised based on measurements at the installation time of the library. The algorithms used …

Revisiting the Time Cost Model of AllReduce

D **ong, L Chen, Y Jiang, D Li, S Wang… - arxiv preprint arxiv …, 2024 - arxiv.org
AllReduce is an important and popular collective communication primitive, which has been
widely used in areas such as distributed machine learning and high performance …

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

G Katevenis, M Ploumidis, M Marazakis - Proceedings of the 52nd …, 2023 - dl.acm.org
Recent processor advances have made feasible HPC nodes with high core counts, capable
of hosting tens or even, hundreds of processes. Therefore, designing MPI collective …

Towards leveraging collective performance with the support of MPI 4.0 features in MPC

S Bouhrour, T Pepin, J Jaeger - Parallel Computing, 2022 - Elsevier
Persistent collective communications and communicator splitting according to the underlying
hardware topology have recently been voted in the MPI standard. Persistent semantics …

A generalization of the allreduce operation

D Kolmakov, X Zhang - arxiv preprint arxiv:2004.09362, 2020 - arxiv.org
Allreduce is one of the most frequently used MPI collective operations, and thus its
performance attracts much attention in the past decades. Many algorithms were developed …

Optimised allgatherv, reduce_scatter and allreduce communication in message-passing systems

A Jocksch, N Ohana, E Lanti, V Karakasis… - arxiv preprint arxiv …, 2020 - arxiv.org
Collective communications, namely the patterns allgatherv, reduce_scatter, and allreduce in
message-passing systems are optimised based on measurements at the installation time of …

Implementation and performance evaluation of MPI persistent collectives in MPC: a case study

S Bouhrour, J Jaeger - Proceedings of the 27th European MPI Users' …, 2020 - dl.acm.org
Persistent collective communications have recently been voted in the MPI standard, opening
the door to many optimizations to reduce collectives cost, in particular for recurring …