xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning

A Weingram, Y Li, H Qi, D Ng, L Dai, X Lu - Journal of Computer Science …, 2023 - Springer
Abstract Machine learning techniques have become ubiquitous both in industry and
academic applications. Increasing model sizes and training data volumes necessitate fast …

HyperX topology: First at-scale implementation and comparison to the fat-tree

J Domke, S Matsuoka, IR Ivanov, Y Tsushima… - Proceedings of the …, 2019 - dl.acm.org
The de-facto standard topology for modern HPC systems and data-centers are Folded Clos
networks, commonly known as Fat-Trees. The number of network endpoints in these …

Interference between I/O and MPI traffic on fat-tree networks

KA Brown, N Jain, S Matsuoka, M Schulz… - Proceedings of the 47th …, 2018 - dl.acm.org
Network congestion arising from simultaneous data transfers can be a significant
performance bottleneck for many applications, especially when network resources are …

Enabling callback-driven runtime introspection via MPI_T

MA Hermanns, NT Hjlem, M Knobloch… - Proceedings of the 25th …, 2018 - dl.acm.org
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …

Interactive Investigation of Traffic Congestion on Fat‐Tree Networks Using TreeScope

H Bhatia, N Jain, A Bhatele, Y Livnat… - Computer Graphics …, 2018 - Wiley Online Library
Parallel simulation codes often suffer from performance bottlenecks due to network
congestion, leaving millions of dollars of investments underutilized. Given a network …

Overhead of using spare nodes

A Hori, K Yoshinaga, T Herault… - … Journal of High …, 2020 - journals.sagepub.com
With the increasing fault rate on high-end supercomputers, the topic of fault tolerance has
been gathering attention. To cope with this situation, various fault-tolerance techniques are …

The first supercomputer with hyperx topology: A viable alternative to fat-trees?

J Domke, S Matsuoka, I Radanov… - … IEEE Symposium on …, 2019 - ieeexplore.ieee.org
The state-of-the-art topology for modern supercomputers are Folded Clos networks, aka Fat-
Trees. The node count in these massively parallel systems is steadily increasing. This forces …

The MPI_T events interface: An early evaluation and overview of the interface

MA Hermanns, NT Hjelm, M Knobloch, K Mohror… - Parallel computing, 2019 - Elsevier
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …

DragonView: Toward Understanding Network Interference in Dragonfly-based Supercomputers

Acquiring and maintaining high performance computing systems represents a substantial,
long term investment. Therefore, it is paramount to optimize their utilization in order to …

[PDF][PDF] Resource Contention due to Data Movement on HPC Systems

KA Brown - t2r2.star.titech.ac.jp
Large-scale high-performance systems (HPC), or supercomputers, are composed of
hundreds/thousands of nodes that are interconnected using advanced network topologies …