Google Académico

Generalisation of recursive doubling for allreduce: Now with simulation

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Unified x-space parallelization algorithm for conserved discrete unified gas kinetic scheme

Q Zhang, Y Wang, D Pan, J Chen, S Liu, C Zhuo… - Computer Physics …, 2022 - Elsevier

In this paper, the open source multiscale flow solver dugksFoam is optimized with a newly
proposed parallelization strategy and conserved algorithm. A novel X-space parallel …

Guardar Citar Citado por 16 Artículos relacionados Las 2 versiones

[Free GPT-4]
[DeepSeek]

[PDF] usenix.org

Swing: Short-cutting rings for higher bandwidth allreduce

D De Sensi, T Bonato, D Saam, T Hoefler - 21st USENIX Symposium on …, 2024 - usenix.org

The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …

Guardar Citar Citado por 5 Artículos relacionados Las 23 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] nsf.gov

Generalized collective algorithms for the exascale era

M Wilkins, H Wang, P Liu, B Pham… - 2023 IEEE …, 2023 - ieeexplore.ieee.org

Exascale supercomputers have renewed the exigence of improving distributed
communication, specifically MPI collectives. Previous works accelerated collectives for …

Guardar Citar Citado por 3 Artículos relacionados Las 6 versiones

[Free GPT-4]
[DeepSeek]

[HTML] sciencedirect.com

[HTML][HTML] An optimisation of allreduce communication in message-passing systems

A Jocksch, N Ohana, E Lanti, E Koutsaniti… - Parallel Computing, 2021 - Elsevier

Collective communication, namely the pattern allreduce in message-passing systems, is
optimised based on measurements at the installation time of the library. The algorithms used …

Guardar Citar Citado por 6 Artículos relacionados Las 5 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Revisiting the Time Cost Model of AllReduce

D **ong, L Chen, Y Jiang, D Li, S Wang… - arxiv preprint arxiv …, 2024 - arxiv.org

AllReduce is an important and popular collective communication primitive, which has been
widely used in areas such as distributed machine learning and high performance …

Guardar Citar Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

G Katevenis, M Ploumidis, M Marazakis - Proceedings of the 52nd …, 2023 - dl.acm.org

Recent processor advances have made feasible HPC nodes with high core counts, capable
of hosting tens or even, hundreds of processes. Therefore, designing MPI collective …

Guardar Citar Citado por 1 Artículos relacionados

[Free GPT-4]
[DeepSeek]

[PDF] sciencedirect.com

Towards leveraging collective performance with the support of MPI 4.0 features in MPC

S Bouhrour, T Pepin, J Jaeger - Parallel Computing, 2022 - Elsevier

Persistent collective communications and communicator splitting according to the underlying
hardware topology have recently been voted in the MPI standard. Persistent semantics …

Guardar Citar Citado por 4 Artículos relacionados Las 3 versiones

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A generalization of the allreduce operation

D Kolmakov, X Zhang - arxiv preprint arxiv:2004.09362, 2020 - arxiv.org

Allreduce is one of the most frequently used MPI collective operations, and thus its
performance attracts much attention in the past decades. Many algorithms were developed …

Guardar Citar Citado por 5 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimised allgatherv, reduce_scatter and allreduce communication in message-passing systems

A Jocksch, N Ohana, E Lanti, V Karakasis… - arxiv preprint arxiv …, 2020 - arxiv.org

Collective communications, namely the patterns allgatherv, reduce_scatter, and allreduce in
message-passing systems are optimised based on measurements at the installation time of …

Guardar Citar Citado por 4 Artículos relacionados Las 2 versiones Versión en HTML

[Free GPT-4]
[DeepSeek]

[PDF] github.io

Implementation and performance evaluation of MPI persistent collectives in MPC: a case study

S Bouhrour, J Jaeger - Proceedings of the 27th European MPI Users' …, 2020 - dl.acm.org

Persistent collective communications have recently been voted in the MPI standard, opening
the door to many optimizations to reduce collectives cost, in particular for recurring …

Guardar Citar Citado por 3 Artículos relacionados Las 2 versiones

Crear alerta

Citar

Búsqueda avanzada

Guardado en Mi biblioteca

Generalisation of recursive doubling for allreduce: Now with simulation

Unified x-space parallelization algorithm for conserved discrete unified gas kinetic scheme

Swing: Short-cutting rings for higher bandwidth allreduce

Generalized collective algorithms for the exascale era

[HTML][HTML] An optimisation of allreduce communication in message-passing systems

Revisiting the Time Cost Model of AllReduce

Impact of cache coherence on the performance of shared-memory based MPI primitives: a case study for broadcast on intel Xeon scalable processors

Towards leveraging collective performance with the support of MPI 4.0 features in MPC

A generalization of the allreduce operation

Optimised allgatherv, reduce_scatter and allreduce communication in message-passing systems

Implementation and performance evaluation of MPI persistent collectives in MPC: a case study