xCCL: A Survey of Industry-Led Collective Communication Libraries for Deep Learning
Abstract Machine learning techniques have become ubiquitous both in industry and
academic applications. Increasing model sizes and training data volumes necessitate fast …
academic applications. Increasing model sizes and training data volumes necessitate fast …
HyperX topology: First at-scale implementation and comparison to the fat-tree
The de-facto standard topology for modern HPC systems and data-centers are Folded Clos
networks, commonly known as Fat-Trees. The number of network endpoints in these …
networks, commonly known as Fat-Trees. The number of network endpoints in these …
Interference between I/O and MPI traffic on fat-tree networks
Network congestion arising from simultaneous data transfers can be a significant
performance bottleneck for many applications, especially when network resources are …
performance bottleneck for many applications, especially when network resources are …
Enabling callback-driven runtime introspection via MPI_T
MA Hermanns, NT Hjlem, M Knobloch… - Proceedings of the 25th …, 2018 - dl.acm.org
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …
(MPI) is critical for optimizing communication performance. Performance tools for MPI …
Interactive Investigation of Traffic Congestion on Fat‐Tree Networks Using TreeScope
Parallel simulation codes often suffer from performance bottlenecks due to network
congestion, leaving millions of dollars of investments underutilized. Given a network …
congestion, leaving millions of dollars of investments underutilized. Given a network …
Overhead of using spare nodes
With the increasing fault rate on high-end supercomputers, the topic of fault tolerance has
been gathering attention. To cope with this situation, various fault-tolerance techniques are …
been gathering attention. To cope with this situation, various fault-tolerance techniques are …
The first supercomputer with hyperx topology: A viable alternative to fat-trees?
The state-of-the-art topology for modern supercomputers are Folded Clos networks, aka Fat-
Trees. The node count in these massively parallel systems is steadily increasing. This forces …
Trees. The node count in these massively parallel systems is steadily increasing. This forces …
The MPI_T events interface: An early evaluation and overview of the interface
Understanding the behavior of parallel applications that use the Message Passing Interface
(MPI) is critical for optimizing communication performance. Performance tools for MPI …
(MPI) is critical for optimizing communication performance. Performance tools for MPI …
DragonView: Toward Understanding Network Interference in Dragonfly-based Supercomputers
Acquiring and maintaining high performance computing systems represents a substantial,
long term investment. Therefore, it is paramount to optimize their utilization in order to …
long term investment. Therefore, it is paramount to optimize their utilization in order to …