Cross-platform performance prediction of parallel applications using partial execution

LT Yang, X Ma, F Mueller - SC'05: Proceedings of the 2005 …, 2005 - ieeexplore.ieee.org
Performance prediction across platforms is increasingly important as developers can choose
from a wide range of execution platforms. The main challenge remains to perform accurate …

Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system

A Kougkas, H Devarajan, XH Sun - Proceedings of the 27th International …, 2018 - dl.acm.org
Modern High-Performance Computing (HPC) systems are adding extra layers to the memory
and storage hierarchy named deep memory and storage hierarchy (DMSH), to increase I/O …

Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems

V Vishwanath, M Hereld, V Morozov… - Proceedings of 2011 …, 2011 - dl.acm.org
There is growing concern that I/O systems will be hard pressed to satisfy the requirements of
future leadership-class machines. Even current machines are found to be I/O bound for …

Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols

W Liao, A Choudhary - SC'08: Proceedings of the 2008 ACM …, 2008 - ieeexplore.ieee.org
Collective I/O, such as that provided in MPI-IO, enables process collaboration among a
group of processes for greater I/O parallelism. Its implementation involves file domain …

Accelerating i/o forwarding in ibm blue gene/p systems

V Vishwanath, M Hereld, K Iskra… - SC'10: Proceedings …, 2010 - ieeexplore.ieee.org
Current leadership-class machines suffer from a significant imbalance between their
computational power and their I/O bandwidth. I/O forwarding is a paradigm that attempts to …

Grid-based parallel data streaming implemented for the gyrokinetic toroidal code

S Klasky, S Ethier, Z Lin, K Martins, D McCune… - Proceedings of the …, 2003 - dl.acm.org
We have developed a threaded parallel data streaming approach using Globus to transfer
multi-terabyte simulation data from a remote supercomputer to the scientistýs home …

S4D-cache: Smart selective SSD cache for parallel I/O systems

S He, XH Sun, B Feng - 2014 IEEE 34th International …, 2014 - ieeexplore.ieee.org
Parallel file systems (PFS) are widely-used in modern computing systems to mask the ever-
increasing performance gap between computing and data access. PFSs favor large …

Citron: Distributed Range Lock Management with One-sided {RDMA}

J Gao, Y Lu, M **e, Q Wang, J Shu - 21st USENIX Conference on File …, 2023 - usenix.org
Range lock enables concurrent accesses to disjoint parts of a shared storage. However,
existing range lock managers rely on centralized CPU resources to process lock requests …

I/O acceleration via multi-tiered data buffering and prefetching

A Kougkas, H Devarajan, XH Sun - Journal of Computer Science and …, 2020 - Springer
Abstract Modern High-Performance Computing (HPC) systems are adding extra layers to the
memory and storage hierarchy, named deep memory and storage hierarchy (DMSH), to …

An implementation and evaluation of client-side file caching for MPI-IO

W Liao, A Ching, K Coloma… - 2007 IEEE …, 2007 - ieeexplore.ieee.org
Client-side file caching has long been recognized as a file system enhancement to reduce
the amount of data transfer between application processes and I/O servers. However …