Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[PDF][PDF] Toward exascale resilience: 2014 update
Resilience is a major roadblock for HPC executions on future exascale systems. These
systems will typically gather millions of CPU cores running up to a billion threads …
systems will typically gather millions of CPU cores running up to a billion threads …
Design, modeling, and evaluation of a scalable multi-level checkpointing system
High-performance computing (HPC) systems are growing more powerful by utilizing more
hardware components. As the system mean-time-before-failure correspondingly drops …
hardware components. As the system mean-time-before-failure correspondingly drops …
Hot sax: Efficiently finding the most unusual time series subsequence
In this work, we introduce the new problem of finding time series discords. Time series
discords are subsequences of a longer time series that are maximally different to all the rest …
discords are subsequences of a longer time series that are maximally different to all the rest …
Online-ABFT: An online algorithm based fault tolerance scheme for soft error detection in iterative methods
Z Chen - ACM SIGPLAN Notices, 2013 - dl.acm.org
Soft errors are one-time events that corrupt the state of a computing system but not its overall
functionality. Large supercomputers are especially susceptible to soft errors because of their …
functionality. Large supercomputers are especially susceptible to soft errors because of their …
Condition numbers of Gaussian random matrices
Let G_m*n be an m*n real random matrix whose elements are independent and identically
distributed standard normal random variables, and let \kappa_2(G_m*n) be the 2-norm …
distributed standard normal random variables, and let \kappa_2(G_m*n) be the 2-norm …
Algorithm-based fault tolerance for fail-stop failures
Fail-stop failures in distributed environments are often tolerated by checkpointing or
message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix …
message logging. In this paper, we show that fail-stop process failures in ScaLAPACK matrix …
High performance linpack benchmark: a fault tolerant implementation without checkpointing
The probability that a failure will occur before the end of the computation increases as the
number of processors used in a high performance computing application increases. For long …
number of processors used in a high performance computing application increases. For long …
Algorithm-based recovery for iterative methods without checkpointing
Z Chen - Proceedings of the 20th international symposium on …, 2011 - dl.acm.org
In today's high performance computing practice, fail-stop failures are often tolerated by
checkpointing. While checkpointing is a very general technique and can often be applied to …
checkpointing. While checkpointing is a very general technique and can often be applied to …
The reliability wall for exascale supercomputing
X Yang, Z Wang, J Xue, Y Zhou - IEEE Transactions on …, 2011 - ieeexplore.ieee.org
Reliability is a key challenge to be understood to turn the vision of exascale supercomputing
into reality. Inevitably, large-scale supercomputing systems, especially those at the …
into reality. Inevitably, large-scale supercomputing systems, especially those at the …
Algorithm-based checkpoint-free fault tolerance for parallel matrix computations on volatile resources
As the size of today's high performance computers increases from hundreds, to thousands,
and even tens of thousands of processors, node failures in these computers are becoming …
and even tens of thousands of processors, node failures in these computers are becoming …