ARQUIN: architectures for multinode superconducting quantum computers

J Ang, G Carini, Y Chen, I Chuang, M Demarco… - ACM Transactions on …, 2024 - dl.acm.org
Many proposals to scale quantum technology rely on modular or distributed designs
wherein individual quantum processors, called nodes, are linked together to form one large …

P3DFFT: A framework for parallel computations of Fourier transforms in three dimensions

D Pekurovsky - SIAM Journal on Scientific Computing, 2012 - SIAM
Fourier and related transforms are a family of algorithms widely employed in diverse areas
of computational science, notoriously difficult to scale on high-performance parallel …

The IBM Blue Gene/Q interconnection network and message unit

D Chen, NA Eisley, P Heidelberger… - Proceedings of 2011 …, 2011 - dl.acm.org
This is the first paper describing the IBM Blue Gene/Q interconnection network and message
unit. The Blue Gene/Q system is the third generation in the IBM Blue Gene line of massively …

Scalable hierarchical aggregation protocol (SHArP): A hardware architecture for efficient data reduction

RL Graham, D Bureddy, P Lui… - … in HPC (COMHPC), 2016 - ieeexplore.ieee.org
Increased system size and a greater reliance on utilizing system parallelism to achieve
computational needs, requires innovative system architectures to meet the simulation …

Evaluating HPC networks via simulation of parallel workloads

N Jain, A Bhatele, S White, T Gamblin… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org
This paper presents an evaluation and comparison of three topologies that are popular for
building interconnection networks in large-scale supercomputers: torus, fat-tree, and …

The ibm blue gene/q interconnection fabric

D Chen, N Eisley, P Heidelberger, R Senger… - IEEE Micro, 2011 - ieeexplore.ieee.org
This article describes the IBM Blue Gene/Q interconnection network and message unit. Blue
Gene/Q is the third generation in the IBM Blue Gene line of massively parallel …

Efficient all-to-all collective communication schedules for direct-connect topologies

P Basu, L Zhao, J Fantl, S Pal… - Proceedings of the 33rd …, 2024 - dl.acm.org
The all-to-all collective communications primitive is widely used in machine learning (ML)
and high performance computing (HPC) workloads, and optimizing its performance is of …

Looking under the hood of the IBM Blue Gene/Q network

D Chen, N Eisley, P Heidelberger… - SC'12: Proceedings …, 2012 - ieeexplore.ieee.org
This paper explores the performance and optimization of the IBM Blue Gene/Q (BG/Q) five
dimensional torus network on up to 16K nodes. The BG/Q hardware supports multiple …

Designing topology-aware collective communication algorithms for large scale infiniband clusters: Case studies with scatter and gather

K Kandalla, H Subramoni, A Vishnu… - … on Parallel & …, 2010 - ieeexplore.ieee.org
Modern high performance computing systems are being increasingly deployed in a
hierarchical fashion with multi-core computing platforms forming the base of the hierarchy …

[BOOK][B] Fast Fourier transform algorithms for parallel computers

D Takahashi - 2019 - Springer
The fast Fourier transform (FFT) is an efficient implementation of the discrete Fourier
transform (DFT). The FFT is widely used in numerous applications in engineering, science …