Optimization of MPI collective communication on BlueGene/L systems

G Almási, P Heidelberger, CJ Archer… - Proceedings of the 19th …, 2005 - dl.acm.org
BlueGene/L is currently the world's fastest supercomputer. It consists of a large number of
low power dual-processor compute nodes interconnected by high speed torus and collective …

Optical interconnects for extreme scale computing systems

S Rumley, M Bahadori, R Polster, SD Hammond… - Parallel Computing, 2017 - Elsevier
Large-scale high performance computing is permeating nearly every corner of modern
applications spanning from scientific research and business operations, to medical …

Topology map** for Blue Gene/L supercomputer

H Yu, IH Chung, J Moreira - Proceedings of the 2006 ACM/IEEE …, 2006 - dl.acm.org
Map** virtual processes onto physical processos is one of the most important issues in
parallel computing. The problem of map** of processes/tasks onto processors is …

Bandwidth steering in HPC using silicon nanophotonics

G Michelogiannakis, Y Shen, MY Teh, X Meng… - Proceedings of the …, 2019 - dl.acm.org
As bytes-per-FLOP ratios continue to decline, communication is becoming a bottleneck for
performance scaling. This paper describes bandwidth steering in HPC using emerging …

Topological properties assessment of optoelectronic architectures

BA Mahafzah, AA Al-Adwan, RI Zaghloul - Telecommunication Systems, 2022 - Springer
Contradictory needs for high scalable, high speed, low latency, and low-cost architectures
turn researchers' attention toward optoelectronic architectures. This is due to its ability to …

Scaling parallel I/O performance through I/O delegate and caching system

A Nisar, W Liao, A Choudhary - SC'08: Proceedings of the …, 2008 - ieeexplore.ieee.org
Increasingly complex scientific applications require massive parallelism to achieve the goals
of fidelity and high computational performance. Such applications periodically offload …

Aphid: Hierarchical task placement to enable a tapered fat tree topology for lower power and cost in hpc networks

G Michelogiannakis, KZ Ibrahim, J Shalf… - 2017 17th IEEE/ACM …, 2017 - ieeexplore.ieee.org
The power and procurement cost of bandwidth in system-wide networks has forced a steady
drop in the byte/flop ratio. This trend of computation becoming faster relative to the network …

Parallel file system analysis through application I/O tracing

SA Wright, SD Hammond, SJ Pennycook… - The Computer …, 2013 - academic.oup.com
Abstract Input/Output (I/O) operations can represent a significant proportion of the run-time of
parallel scientific computing applications. Although there have been several advances in file …

TAGO: Rethinking routing design in high performance reconfigurable networks

MY Teh, YH Hung, G Michelogiannakis… - … Conference for High …, 2020 - ieeexplore.ieee.org
Many reconfigurable network topologies have been proposed in the past. However, efficient
routing on top of these flexible interconnects still presents a challenge. In this work, we …

Map** communication layouts to network hardware characteristics on massive-scale blue gene systems

P Balaji, R Gupta, A Vishnu, P Beckman - Computer Science-Research …, 2011 - Springer
For parallel applications running on high-end computing systems, which processes of an
application get launched on which processing cores is typically determined at application …