Optimization of MPI collective communication on BlueGene/L systems
G Almási, P Heidelberger, CJ Archer… - Proceedings of the 19th …, 2005 - dl.acm.org
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of
low power dual-processor compute nodes interconnected by high speed torus and collective …
low power dual-processor compute nodes interconnected by high speed torus and collective …
Optical interconnects for extreme scale computing systems
Large-scale high performance computing is permeating nearly every corner of modern
applications spanning from scientific research and business operations, to medical …
applications spanning from scientific research and business operations, to medical …
Topology map** for Blue Gene/L supercomputer
Map** virtual processes onto physical processos is one of the most important issues in
parallel computing. The problem of map** of processes/tasks onto processors is …
parallel computing. The problem of map** of processes/tasks onto processors is …
Bandwidth steering in HPC using silicon nanophotonics
As bytes-per-FLOP ratios continue to decline, communication is becoming a bottleneck for
performance scaling. This paper describes bandwidth steering in HPC using emerging …
performance scaling. This paper describes bandwidth steering in HPC using emerging …
Topological properties assessment of optoelectronic architectures
Contradictory needs for high scalable, high speed, low latency, and low-cost architectures
turn researchers' attention toward optoelectronic architectures. This is due to its ability to …
turn researchers' attention toward optoelectronic architectures. This is due to its ability to …
Scaling parallel I/O performance through I/O delegate and caching system
Increasingly complex scientific applications require massive parallelism to achieve the goals
of fidelity and high computational performance. Such applications periodically offload …
of fidelity and high computational performance. Such applications periodically offload …
Aphid: Hierarchical task placement to enable a tapered fat tree topology for lower power and cost in hpc networks
The power and procurement cost of bandwidth in system-wide networks has forced a steady
drop in the byte/flop ratio. This trend of computation becoming faster relative to the network …
drop in the byte/flop ratio. This trend of computation becoming faster relative to the network …
Parallel file system analysis through application I/O tracing
Abstract Input/Output (I/O) operations can represent a significant proportion of the run-time of
parallel scientific computing applications. Although there have been several advances in file …
parallel scientific computing applications. Although there have been several advances in file …
TAGO: Rethinking routing design in high performance reconfigurable networks
Many reconfigurable network topologies have been proposed in the past. However, efficient
routing on top of these flexible interconnects still presents a challenge. In this work, we …
routing on top of these flexible interconnects still presents a challenge. In this work, we …
Map** communication layouts to network hardware characteristics on massive-scale blue gene systems
For parallel applications running on high-end computing systems, which processes of an
application get launched on which processing cores is typically determined at application …
application get launched on which processing cores is typically determined at application …