RLScheduler: an automated HPC batch job scheduler using reinforcement learning

D Zhang, D Dai, Y He, FS Bao… - … Conference for High …, 2020 - ieeexplore.ieee.org
Today's high-performance computing (HPC) platforms are still dominated by batch jobs.
Accordingly, effective batch job scheduling is crucial to obtain high system efficiency …

Dividable configuration performance learning

J Gong, T Chen, R Bahsoon - IEEE Transactions on Software …, 2024 - ieeexplore.ieee.org
Machine/deep learning models have been widely adopted to predict the configuration
performance of software systems. However, a crucial yet unaddressed challenge is how to …

Interpreting write performance of supercomputer I/O systems with regression models

B **e, Z Tan, P Carns, J Chase, K Harms… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
This work seeks to advance the state of the art in HPC I/O performance analysis and
interpretation. In particular, we demonstrate effective techniques to:(1) model output …

The role of storage target allocation in applications' I/O performance with BeeGFS

F Boito, G Pallez, L Teylo - 2022 IEEE International Conference …, 2022 - ieeexplore.ieee.org
Parallel file systems are at the core of HPC I/O infrastructures. Those systems minimize the
I/O time of applications by separating files into fixed-size chunks and distributing them …

Design and implementation of I/O performance prediction scheme on HPC systems through large-scale log analysis

S Kim, A Sim, K Wu, S Byna, Y Son - Journal of Big Data, 2023 - Springer
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units used by hundreds to thousands of users …

Sctuner: An autotuner addressing dynamic i/o needs on supercomputer i/o subsystems

H Tang, B **e, S Byna, P Carns… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
In high-performance computing (HPC), scientific applications often manage a massive
amount of data using I/O libraries. These libraries provide convenient data model …

I/O Behind the Scenes: Bandwidth Requirements of HPC Applications with Asynchronous I/O

A Tarraf, JF Muñoz, DE Singh, T Ozden… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
I/O bandwidth is a critical resource in an HPC cluster. As with all shared resources, its
availability is impacted significantly by the users and the applications they execute. Without …

User-based I/O Profiling for Leadership Scale HPC Workloads

AH Yazdani, AK Paul, AM Karimi, F Wang… - Proceedings of the 26th …, 2025 - dl.acm.org
I/O constitutes a significant portion of most of the application run-time. Spawning many such
applications concurrently on an HPC system leads to severe I/O contention. Thus …

[PDF][PDF] Design and implementation of I/O performance prediction scheme on HPC systems

S Kim, A Sim, K Wu - Journal of Big Data, 10 (1), 2023 - escholarship.org
Large-scale high performance computing (HPC) systems typically consist of many
thousands of CPUs and storage units used by hundreds to thousands of users …

Estimation of the impact of I/O forwarding on application performance

FZ Boito - 2020 - inria.hal.science
In high performance computing architectures, the I/O forwarding technique is often used to
alleviate contention in the access to the shared parallel file system servers. Intermediate I/O …