Prionn: Predicting runtime and io using neural networks
For job allocation decision, current batch schedulers have access to and use only
information on the number of nodes and runtime because it is readily available at …
information on the number of nodes and runtime because it is readily available at …
DCDB wintermute: Enabling online and holistic operational data analytics on HPC systems
A Netti, M Müller, C Guillen, M Ott, D Tafani… - Proceedings of the 29th …, 2020 - dl.acm.org
As we approach the exascale era, the size and complexity of HPC systems continues to
increase, raising concerns about their manageability and sustainability. For this reason …
increase, raising concerns about their manageability and sustainability. For this reason …
OKCM: improving parallel task scheduling in high-performance computing systems using online learning
J Li, X Zhang, L Han, Z Ji, X Dong, C Hu - The Journal of Supercomputing, 2021 - Springer
Task scheduling is becoming increasingly important in large-scale high-performance
computing real-time systems as the parallel scale, number and types of task continue to …
computing real-time systems as the parallel scale, number and types of task continue to …
Capturing periodic I/O using frequency techniques
Many HPC applications perform their I/O in bursts that follow a periodic pattern. This allows
for making predictions as to when a burst occurs. System providers can take advantage of …
for making predictions as to when a burst occurs. System providers can take advantage of …
Towards hpc i/o performance prediction through large-scale log analysis
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units, while used by hundreds to thousands of users at the …
thousands of CPUs and storage units, while used by hundreds to thousands of users at the …
HPC I/O throughput bottleneck analysis with explainable local models
With the growing complexity of high-performance computing (HPC) systems, achieving high
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …
A conceptual framework for HPC operational data analytics
This paper provides a broad framework for understanding trends in Operational Data
Analytics (ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to …
Analytics (ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to …
Silicon photonic Flex-LIONS for bandwidth-reconfigurable optical interconnects
This paper reports the first experimental demonstration of silicon photonic (SiPh) Flex-
LIONS, a bandwidth-reconfigurable SiPh switching fabric based on wavelength routing in …
LIONS, a bandwidth-reconfigurable SiPh switching fabric based on wavelength routing in …
Understanding hpc application i/o behavior using system level statistics
AK Paul, O Faaland, A Moody… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org
The processor performance of high performance computing (HPC) systems is increasing at
a much higher rate than storage performance. This imbalance leads to I/O performance …
a much higher rate than storage performance. This imbalance leads to I/O performance …
Power profile monitoring and tracking evolution of system-wide hpc workloads
The power & energy demands of HPC machines have grown significantly. Modern exascale
HPC systems require tens of megawatts of combined power for computing resources and …
HPC systems require tens of megawatts of combined power for computing resources and …