Robust searching-based gradient collaborative management in intelligent transportation system
With the rapid development of big data and the Internet of Things (IoT), traffic data from an
Intelligent Transportation System (ITS) is becoming more and more accessible. To …
Intelligent Transportation System (ITS) is becoming more and more accessible. To …
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …
workloads running on distributed systems. One factor determining its performance is the …
Efficient process arrival pattern aware collective communication for deep learning
MPI collective communication operations are used extensively in parallel applications. As
such, researchers have been investigating how to improve their performance and scalability …
such, researchers have been investigating how to improve their performance and scalability …
OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning
The communication bottleneck has severely restricted the scalability of distributed deep
learning. Tensor fusion improves the scalability of data parallelism by overlap** …
learning. Tensor fusion improves the scalability of data parallelism by overlap** …
[PDF][PDF] An efficient implementation of blocking and persistent MPI collective communication
A Jocksch, JG Piccinali - 2023 - eurompi23.github.io
Persistent collective communication [3] became a feature of the MPI standard in version 4.0
and first implementations are available in various libraries such as MPICH, OpenMPI, and …
and first implementations are available in various libraries such as MPICH, OpenMPI, and …
[PDF][PDF] MPI for multi-core, multi socket, and GPU architec-tures: Optimised shared memory allreduce
A Jocksch, JG Piccinali - pasc23.pasc-conference.org
• Today's supercomputers have a growing number of cores per socket and more and more
sockets per node• Intranode communication needs to be efficient also as part of more …
sockets per node• Intranode communication needs to be efficient also as part of more …