Prionn: Predicting runtime and io using neural networks

MR Wyatt, S Herbein, T Gamblin, A Moody… - Proceedings of the 47th …, 2018 - dl.acm.org
For job allocation decision, current batch schedulers have access to and use only
information on the number of nodes and runtime because it is readily available at …

DCDB wintermute: Enabling online and holistic operational data analytics on HPC systems

A Netti, M Müller, C Guillen, M Ott, D Tafani… - Proceedings of the 29th …, 2020 - dl.acm.org
As we approach the exascale era, the size and complexity of HPC systems continues to
increase, raising concerns about their manageability and sustainability. For this reason …

OKCM: improving parallel task scheduling in high-performance computing systems using online learning

J Li, X Zhang, L Han, Z Ji, X Dong, C Hu - The Journal of Supercomputing, 2021 - Springer
Task scheduling is becoming increasingly important in large-scale high-performance
computing real-time systems as the parallel scale, number and types of task continue to …

Capturing periodic I/O using frequency techniques

A Tarraf, A Bandet, F Boito, G Pallez… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Many HPC applications perform their I/O in bursts that follow a periodic pattern. This allows
for making predictions as to when a burst occurs. System providers can take advantage of …

Towards hpc i/o performance prediction through large-scale log analysis

S Kim, A Sim, K Wu, S Byna, Y Son, H Eom - Proceedings of the 29th …, 2020 - dl.acm.org
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units, while used by hundreds to thousands of users at the …

HPC I/O throughput bottleneck analysis with explainable local models

M Isakov, E Del Rosario, S Madireddy… - … Conference for High …, 2020 - ieeexplore.ieee.org
With the growing complexity of high-performance computing (HPC) systems, achieving high
performance can be difficult because of I/O bottlenecks. We analyze multiple years' worth of …

A conceptual framework for HPC operational data analytics

A Netti, W Shin, M Ott, T Wilde… - 2021 IEEE International …, 2021 - ieeexplore.ieee.org
This paper provides a broad framework for understanding trends in Operational Data
Analytics (ODA) for High-Performance Computing (HPC) facilities. The goal of ODA is to …

Silicon photonic Flex-LIONS for bandwidth-reconfigurable optical interconnects

X **ao, R Proietti, G Liu, H Lu, P Fotouhi… - IEEE Journal of …, 2019 - ieeexplore.ieee.org
This paper reports the first experimental demonstration of silicon photonic (SiPh) Flex-
LIONS, a bandwidth-reconfigurable SiPh switching fabric based on wavelength routing in …

Understanding hpc application i/o behavior using system level statistics

AK Paul, O Faaland, A Moody… - 2020 IEEE 27th …, 2020 - ieeexplore.ieee.org
The processor performance of high performance computing (HPC) systems is increasing at
a much higher rate than storage performance. This imbalance leads to I/O performance …

Power profile monitoring and tracking evolution of system-wide hpc workloads

AM Karimi, NS Sattar, W Shin… - 2024 IEEE 44th …, 2024 - ieeexplore.ieee.org
The power & energy demands of HPC machines have grown significantly. Modern exascale
HPC systems require tens of megawatts of combined power for computing resources and …