Robust searching-based gradient collaborative management in intelligent transportation system

H Shi, H Wang, R Ma, Y Hua, T Song, H Gao… - ACM Transactions on …, 2023 - dl.acm.org
With the rapid development of big data and the Internet of Things (IoT), traffic data from an
Intelligent Transportation System (ITS) is becoming more and more accessible. To …

Swing: Short-cutting Rings for Higher Bandwidth Allreduce

D De Sensi, T Bonato, D Saam, T Hoefler - 21st USENIX Symposium on …, 2024 - usenix.org
The allreduce collective operation accounts for a significant fraction of the runtime of
workloads running on distributed systems. One factor determining its performance is the …

Efficient process arrival pattern aware collective communication for deep learning

P Alizadeh, A Sojoodi, Y Hassan Temucin… - Proceedings of the 29th …, 2022 - dl.acm.org
MPI collective communication operations are used extensively in parallel applications. As
such, researchers have been investigating how to improve their performance and scalability …

OF-WFBP: A near-optimal communication mechanism for tensor fusion in distributed deep learning

Y Gao, Z Zhang, B Hu, AL **, C Wu - Parallel Computing, 2023 - Elsevier
The communication bottleneck has severely restricted the scalability of distributed deep
learning. Tensor fusion improves the scalability of data parallelism by overlap** …

[PDF][PDF] An efficient implementation of blocking and persistent MPI collective communication

A Jocksch, JG Piccinali - 2023 - eurompi23.github.io
Persistent collective communication [3] became a feature of the MPI standard in version 4.0
and first implementations are available in various libraries such as MPICH, OpenMPI, and …

[PDF][PDF] MPI for multi-core, multi socket, and GPU architec-tures: Optimised shared memory allreduce

A Jocksch, JG Piccinali - pasc23.pasc-conference.org
• Today's supercomputers have a growing number of cores per socket and more and more
sockets per node• Intranode communication needs to be efficient also as part of more …