Multi-axis decomposition of density functional program for strong scaling up to 82,944 nodes on the K computer: Compactly folded 3D-FFT communicators in the 6D …

T Yamasaki, A Kuroda, T Kato, J Nara, J Koga… - Computer Physics …, 2019 - Elsevier
Density functional calculations with a plane-wave basis set are widely used in materials
science. Due to recent developments in high-performance computers, the number of nodes …

Scalable Hierarchical Aggregation and Reduction Protocol (SHARP)TM Streaming-Aggregation Hardware Design and Evaluation

RL Graham, L Levi, D Burredy, G Bloch… - … Conference, ISC High …, 2020 - Springer
This paper describes the new hardware-based streaming-aggregation capability added to
Mellanox's Scalable Hierarchical Aggregation and Reduction Protocol in its HDR InfiniBand …

Performance evaluation of ultra-large-scale first-principles electronic structure calculation code on the K computer

Y Hasegawa, JI Iwata, M Tsuji… - … journal of high …, 2014 - journals.sagepub.com
Silicon nanowires are potentially useful in next-generation field-effect transistors, and it is
important to clarify the electron states of silicon nanowires to know the behavior of new …

MPI sessions: leveraging runtime infrastructure to increase scalability of applications at exascale

D Holmes, K Mohror, RE Grant, A Skjellum… - Proceedings of the 23rd …, 2016 - dl.acm.org
MPI includes all processes in MPI_COMM_WORLD; this is untenable for reasons of scale,
resiliency, and overhead. This paper offers a new approach, extending MPI with a new …

The K computer operations: experiences and statistics

K Yamamoto, A Uno, H Murai, T Tsukamoto… - Procedia Computer …, 2014 - Elsevier
The K computer, released on September 29, 2012, is a large-scale parallel supercomputer
system consisting of 82,944 compute nodes. We have been able to resolve a significant …

Communication sparsity in distributed spiking neural network simulations to improve scalability

C Fernandez-Musoles, D Coca… - Frontiers in …, 2019 - frontiersin.org
In the last decade there has been a surge in the number of big science projects interested in
achieving a comprehensive understanding of the functions of the brain, using Spiking …

A holistic smart home demonstrator for anomaly detection and response

J Lundström, WO De Morais… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Applying machine learning methods in scenarios involving smart homes is a complex task.
The many possible variations of sensors, feature representations, machine learning …

Collective algorithms for multiported torus networks

P Sack, W Gropp - ACM Transactions on Parallel Computing (TOPC), 2015 - dl.acm.org
Modern supercomputers with torus networks allow each node to simultaneously pass
messages on all of its links. However, most collective algorithms are designed to only use …

Communication optimization technology based on network dynamic performance model

X Cui, X Li, B Wang - Mathematical Problems in Engineering, 2020 - Wiley Online Library
This work analyses different communication modes in applications of supercomputing,
proposes a communication dynamic performance model based on topology awareness, and …

Improved strong scaling of a spectral/finite difference gyrokinetic code for multi-scale plasma turbulence

S Maeyama, T Watanabe, Y Idomura, M Nakata… - Parallel Computing, 2015 - Elsevier
Optimization techniques of a plasma turbulence simulation code GKV for improved strong
scaling are presented. This work is motivated by multi-scale plasma turbulence extending …