A checkpoint of research on parallel i/o for high-performance computing

FZ Boito, EC Inacio, JL Bez, POA Navaux… - ACM Computing …, 2018 - dl.acm.org
We present a comprehensive survey on parallel I/O in the high-performance computing
(HPC) context. This is an important field for HPC because of the historic gap between …

I/o characterization and performance evaluation of beegfs for deep learning

F Chowdhury, Y Zhu, T Heer, S Paredes… - Proceedings of the 48th …, 2019 - dl.acm.org
Parallel File Systems (PFSs) are frequently deployed on leadership High Performance
Computing (HPC) systems to ensure efficient I/O, persistent storage and scalable …

Scheduling the I/O of HPC applications under congestion

A Gainaru, G Aupy, A Benoit, F Cappello… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
A significant percentage of the computing capacity of large-scale platforms is wasted
because of interferences incurred by multiple applications that access a shared parallel file …

Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected

T Patel, S Byna, GK Lockwood, D Tiwari - Proceedings of the …, 2019 - dl.acm.org
Large-scale applications typically spend a large fraction of their execution time performing
I/O to a parallel storage system. However, with rapid progress in compute and storage …

End-to-end I/O monitoring on leading supercomputers

B Yang, W Xue, T Zhang, S Liu, X Ma, X Wang… - ACM Transactions on …, 2023 - dl.acm.org
This paper offers a solution to overcome the complexities of production system I/O
performance monitoring. We present Beacon, an end-to-end I/O resource monitoring and …

Starship: Mitigating i/o bottlenecks in serverless computing for scientific workflows

R Basu Roy, D Tiwari - Proceedings of the ACM on Measurement and …, 2024 - dl.acm.org
This work highlights the significance of I/O bottlenecks that data-intensive HPC workflows
face in serverless environments-an issue that has been largely overlooked by prior works …

A year in the life of a parallel file system

GK Lockwood, S Snyder, T Wang… - … conference for high …, 2018 - ieeexplore.ieee.org
I/O performance is a critical aspect of data-intensive scientific computing. We seek to
advance the state of the practice in understanding and diagnosing I/O performance issues …

Access patterns and performance behaviors of multi-layer supercomputer i/o subsystems under production load

JL Bez, AM Karimi, AK Paul, B **e, S Byna… - Proceedings of the 31st …, 2022 - dl.acm.org
Scientific computing workloads at HPC facilities have been shifting from traditional
numerical simulations to AI/ML applications for training and inference while processing and …

ElastiSim: a batch-system simulator for malleable workloads

T Özden, T Beringer, A Mazaheri, HM Fard… - Proceedings of the 51st …, 2022 - dl.acm.org
As high-performance computing infrastructures move towards exascale, the role of resource
and job management systems is more critical now than ever. Simulating batch systems to …

Ad hoc file systems for high-performance computing

A Brinkmann, K Mohror, W Yu, P Carns… - Journal of Computer …, 2020 - Springer
Storage backends of parallel compute clusters are still based mostly on magnetic disks,
while newer and faster storage technologies such as flash-based SSDs or non-volatile …