Revisiting I/O behavior in large-scale storage systems: The expected and the unexpected

T Patel, S Byna, GK Lockwood, D Tiwari - Proceedings of the …, 2019 - dl.acm.org
Large-scale applications typically spend a large fraction of their execution time performing
I/O to a parallel storage system. However, with rapid progress in compute and storage …

A year in the life of a parallel file system

GK Lockwood, S Snyder, T Wang… - … conference for high …, 2018 - ieeexplore.ieee.org
I/O performance is a critical aspect of data-intensive scientific computing. We seek to
advance the state of the practice in understanding and diagnosing I/O performance issues …

Systematically inferring I/O performance variability by examining repetitive job behavior

E Costa, T Patel, B Schwaller, JM Brandt… - Proceedings of the …, 2021 - dl.acm.org
Monitoring and analyzing I/O behaviors is critical to the efficient utilization of parallel storage
systems. Unfortunately, with increasing I/O requirements and resource contention, I/O …

Uncovering access, reuse, and sharing characteristics of {I/O-Intensive} files on {Large-Scale} production {HPC} systems

T Patel, S Byna, GK Lockwood, NJ Wright… - … USENIX Conference on …, 2020 - usenix.org
Large-scale high-performance computing (HPC) applications running on supercomputers
produce large amounts of data routinely and store it in files on multi-PB shared parallel …

Real-time I/O-monitoring of HPC applications with SIOX, elasticsearch, Grafana and FUSE

E Betke, J Kunkel - High Performance Computing: ISC High Performance …, 2017 - Springer
The starting point for our work was a demand for an overview of application's I/O behavior,
that provides information about the usage of our HPC “Mistral”. We suspect that some …

UMAMI: a recipe for generating meaningful metrics through holistic I/O performance analysis

GK Lockwood, W Yoo, S Byna, NJ Wright… - Proceedings of the 2nd …, 2017 - dl.acm.org
I/O efficiency is essential to productivity in scientific computing, especially as many scientific
domains become more data-intensive. Many characterization tools have been used to …

A comprehensive i/o knowledge cycle for modular and automated hpc workload analysis

Z Zhu, S Neuwirth, T Lippert - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
On the way to the exascale era, millions of parallel processing elements are required.
Accordingly, one major chal-lenge is the ever-widening gap between computational power …

Ai-coupled hpc workflows

S Jha, VR Pascuzzi, M Turilli - arxiv preprint arxiv:2208.11745, 2022 - arxiv.org
Increasingly, scientific discovery requires sophisticated and scalable workflows. Workflows
have become the``new applications,''wherein multi-scale computing campaigns comprise …

Improving collective i/o performance with machine learning supported auto-tuning

A Bağbaba - 2020 IEEE International Parallel and Distributed …, 2020 - ieeexplore.ieee.org
Collective Input and output (I/O) is an essential approach in high performance computing
(HPC) applications. The achievement of effective collective I/O is a nontrivial job due to the …

Tools for analyzing parallel I/O

JM Kunkel, E Betke, M Bryson, P Carns… - … Computing: ISC High …, 2018 - Springer
Parallel application I/O performance often does not meet user expectations. Additionally,
slight access pattern modifications may lead to significant changes in performance due to …