Cross-platform performance prediction of parallel applications using partial execution
Performance prediction across platforms is increasingly important as developers can choose
from a wide range of execution platforms. The main challenge remains to perform accurate …
from a wide range of execution platforms. The main challenge remains to perform accurate …
Hermes: a heterogeneous-aware multi-tiered distributed I/O buffering system
Modern High-Performance Computing (HPC) systems are adding extra layers to the memory
and storage hierarchy named deep memory and storage hierarchy (DMSH), to increase I/O …
and storage hierarchy named deep memory and storage hierarchy (DMSH), to increase I/O …
Topology-aware data movement and staging for I/O acceleration on Blue Gene/P supercomputing systems
There is growing concern that I/O systems will be hard pressed to satisfy the requirements of
future leadership-class machines. Even current machines are found to be I/O bound for …
future leadership-class machines. Even current machines are found to be I/O bound for …
Dynamically adapting file domain partitioning methods for collective I/O based on underlying parallel file system locking protocols
Collective I/O, such as that provided in MPI-IO, enables process collaboration among a
group of processes for greater I/O parallelism. Its implementation involves file domain …
group of processes for greater I/O parallelism. Its implementation involves file domain …
Accelerating i/o forwarding in ibm blue gene/p systems
Current leadership-class machines suffer from a significant imbalance between their
computational power and their I/O bandwidth. I/O forwarding is a paradigm that attempts to …
computational power and their I/O bandwidth. I/O forwarding is a paradigm that attempts to …
Grid-based parallel data streaming implemented for the gyrokinetic toroidal code
We have developed a threaded parallel data streaming approach using Globus to transfer
multi-terabyte simulation data from a remote supercomputer to the scientistýs home …
multi-terabyte simulation data from a remote supercomputer to the scientistýs home …
S4D-cache: Smart selective SSD cache for parallel I/O systems
Parallel file systems (PFS) are widely-used in modern computing systems to mask the ever-
increasing performance gap between computing and data access. PFSs favor large …
increasing performance gap between computing and data access. PFSs favor large …
Citron: Distributed Range Lock Management with One-sided {RDMA}
Range lock enables concurrent accesses to disjoint parts of a shared storage. However,
existing range lock managers rely on centralized CPU resources to process lock requests …
existing range lock managers rely on centralized CPU resources to process lock requests …
I/O acceleration via multi-tiered data buffering and prefetching
Abstract Modern High-Performance Computing (HPC) systems are adding extra layers to the
memory and storage hierarchy, named deep memory and storage hierarchy (DMSH), to …
memory and storage hierarchy, named deep memory and storage hierarchy (DMSH), to …
An implementation and evaluation of client-side file caching for MPI-IO
Client-side file caching has long been recognized as a file system enhancement to reduce
the amount of data transfer between application processes and I/O servers. However …
the amount of data transfer between application processes and I/O servers. However …