Software approaches for resilience of high performance computing systems: a survey

J Jia, Y Liu, G Zhang, Y Gao, D Qian - Frontiers of Computer Science, 2023 - Springer
With the scaling up of high-performance computing systems in recent years, their reliability
has been descending continuously. Therefore, system resilience has been regarded as one …

Optimization of multi-level checkpoint model for large scale HPC applications

S Di, MS Bouguerra, L Bautista-Gomez… - 2014 IEEE 28th …, 2014 - ieeexplore.ieee.org
HPC community projects that future extreme scale systems will be much less stable than
current Petascale systems, thus requiring sophisticated fault tolerance to guarantee the …

Dark sky simulations: Early data release

SW Skillman, MS Warren, MJ Turk… - ar** on Titan supercomputer
K Kurte, J Sanyal, A Berres, D Lunga… - Concurrency and …, 2019 - Wiley Online Library
This paper presents a scalable object detection workflow for detecting objects, such as
settlements, from remotely sensed (RS) imagery. We have successfully deployed this …