PLFS: a checkpoint filesystem for parallel applications

J Bent, G Gibson, G Grider, B McClelland… - Proceedings of the …, 2009 - dl.acm.org
Parallel applications running across thousands of processors must protect themselves from
inevitable system failures. Many applications insulate themselves from failures by …

End-to-end I/O monitoring on leading supercomputers

B Yang, W Xue, T Zhang, S Liu, X Ma, X Wang… - ACM Transactions on …, 2023 - dl.acm.org
This paper offers a solution to overcome the complexities of production system I/O
performance monitoring. We present Beacon, an end-to-end I/O resource monitoring and …

... And eat it too: High read performance in write-optimized HPC I/O middleware file formats

M Polte, J Lofstead, J Bent, G Gibson… - Proceedings of the 4th …, 2009 - dl.acm.org
As HPC applications run on increasingly high process counts on larger and larger
machines, both the frequency of checkpoints needed for fault tolerance [14] and the …

The power and challenges of transformative i/o

A Manzanares, J Bent, M Wingate… - 2012 IEEE International …, 2012 - ieeexplore.ieee.org
Extracting high data bandwidth and metadata rates from parallel file systems is notoriously
difficult. User workloads almost never achieve the performance of synthetic benchmarks …

[PDF][PDF] Hpc computation on hadoop storage with plfs

C Cranor, M Polte, G Gibson - Parallel Data Laboratory at Carnegie Mellon …, 2012 - Citeseer
In this report we describe how we adapted the Parallel Log Structured Filesystem (PLFS) to
enable HPC applications to be able read and write data from the HDFS cloud storage …

Structuring PLFS for extensibility

C Cranor, M Polte, G Gibson - Proceedings of the 8th Parallel Data …, 2013 - dl.acm.org
The Parallel Log Structured Filesystem (PLFS)[5] was designed to transparently transform
highly concurrent, massive high-performance computing (HPC) N-to-1 checkpoint workloads …

Ress: A reliable energy-efficient storage system

S Yin, Z **ao, K Li, J Huang, X Ruan… - 2016 IEEE 22nd …, 2016 - ieeexplore.ieee.org
Extracting high I/O performance from parallel file systems is no longer the only goal in
modern data centres. As issues of the Energy Wall and the Reliability Wall become …

[PDF][PDF] ... And Eat It Too: High Read Performance in Write-Optimized HPC I/O Middleware File Formats (CMU-PDL-09-111)

M Polte, J Lofstead, J Bent, G Gibson, SA Klasky, Q Liu… - 2009 - pdsw.org
As HPC applications run on increasingly high process counts on larger and larger
machines, both the frequency of checkpoints needed for fault tolerance [14] and the …

[PDF][PDF] optimized HPC I/O middleware file formats

J Lofstead, M Parashar, N Podhorszki, K Schwan - 2009 - academia.edu
As HPC applications run on increasingly high process counts on larger and larger
machines, both the frequency of checkpoints needed for fault tolerance [14] and the …