Optimization of a multilevel checkpoint model with uncertain execution scales
Future extreme-scale systems are expected to experience different types of failures affecting
applications with different failure scales, from transient uncorrectable memory errors in …
applications with different failure scales, from transient uncorrectable memory errors in …
[PDF][PDF] Topology-aware job scheduling strategies for torus networks
●# PBS–l nodeset= ONEOF: FEATURE: s1_24x6x24, s2_24x6x24,…● Job will run in first
available requested feature● Good run time consistency for PSDNS in 24x8x24 nodeset● …
available requested feature● Good run time consistency for PSDNS in 24x8x24 nodeset● …
Early experiences scaling VMD molecular visualization and analysis jobs on Blue Waters
Pataskala molecular dynamics simulations provide a powerful tool for probing the dynamics
of cellular processes at atomic and nanosecond resolution not achievable by experimental …
of cellular processes at atomic and nanosecond resolution not achievable by experimental …
Breaking and fixing the self encryption scheme for data security in mobile devices
Data security is one of the major challenges that prevents the wider acceptance of mobile
devices, especially within business and government environments. It is non-trivial to protect …
devices, especially within business and government environments. It is non-trivial to protect …
[書籍][B] Failure avoidance techniques for HPC systems based on failure prediction
A Gainaru - 2015 - search.proquest.com
A increasingly larger percentage of computing capacity in today's large high-performance
computing systems is wasted due to failures and recoveries. Moreover, it is expected that …
computing systems is wasted due to failures and recoveries. Moreover, it is expected that …
[PDF][PDF] Expanding Blue Waters with improved acceleration capability
Blue Waters, the first open-science supercomputer to achieve a sustained rate of one
petaflop/s on a broad mix of scientific applications, is the largest system ever built by Cray. It …
petaflop/s on a broad mix of scientific applications, is the largest system ever built by Cray. It …
[PDF][PDF] A classification of parallel I/O toward demystifying HPC I/O best practices
R Sisneros - Proceedings of Cray User Group Meeting (CUG-2016), 2016 - cug.org
The process of optimizing parallel I/O can quite easily become daunting. By the nature of its
implementation there are many highly sensitive, tunable parameters and a subtle change to …
implementation there are many highly sensitive, tunable parameters and a subtle change to …
Resiliency of high-performance computing systems: A fault-injection-based characterization of the high-speed network in the blue waters testbed
SS Tang - 2018 - ideals.illinois.edu
Supercomputers have played an essential role in the progress of science and engineering
research. As the high-performance computing (HPC) community moves towards the next …
research. As the high-performance computing (HPC) community moves towards the next …
[PDF][PDF] How Deep is Your I/O? Toward Practical Large-Scale I/O Optimization via Machine Learning Methods
Performance-related diagnostic data routinely collected by administrators of HPC machines
is an excellent target for the application of machine learning approaches. There is a clear …
is an excellent target for the application of machine learning approaches. There is a clear …