Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
[PDF][PDF] Toward exascale resilience: 2014 update
Resilience is a major roadblock for HPC executions on future exascale systems. These
systems will typically gather millions of CPU cores running up to a billion threads …
systems will typically gather millions of CPU cores running up to a billion threads …
Reproducibility, Replicability and Repeatability: A survey of reproducible research with a focus on high performance computing
Reproducibility is widely acknowledged as a fundamental principle in scientific research.
Currently, the scientific community grapples with numerous challenges associated with …
Currently, the scientific community grapples with numerous challenges associated with …
[КНИГА][B] Fault tolerance techniques for high-performance computing
This chapter provides an introduction to resilience methods. The emphasis is on
checkpointing, the de-facto standard technique for resilience in High Performance …
checkpointing, the de-facto standard technique for resilience in High Performance …
Detection and correction of silent data corruption for large-scale high-performance computing
Faults have become the norm rather than the exception for high-end computing clusters.
Exacerbating this situation, some of these faults remain undetected, manifesting themselves …
Exacerbating this situation, some of these faults remain undetected, manifesting themselves …
Understanding the propagation of transient errors in HPC applications
Resiliency of exascale systems has quickly become an important concern for the scientific
community. Despite its importance, still much remains to be determined regarding how faults …
community. Despite its importance, still much remains to be determined regarding how faults …
Evaluating the impact of SDC on the GMRES iterative solver
Increasing parallelism and transistor density, along with increasingly tighter energy and
peak power constraints, may force exposure of occasionally incorrect computation or …
peak power constraints, may force exposure of occasionally incorrect computation or …
Fault tolerant preconditioned conjugate gradient for sparse linear system solution
M Shantharam, S Srinivasmurthy… - Proceedings of the 26th …, 2012 - dl.acm.org
In scientific applications that involve dense matrices, checksum encodings have yielded"
algorithm-based fault tolerance"(ABFT) in the event of data corruption from either hard or …
algorithm-based fault tolerance"(ABFT) in the event of data corruption from either hard or …
Self-stabilizing iterative solvers
We show how to use the idea of self-stabilization, which originates in the context of
distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system …
distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system …
Classifying soft error vulnerabilities in extreme-scale scientific applications using a binary instrumentation tool
Extreme-scale scientific applications are at a significant risk of being hit by soft errors on
supercomputers as the scale of these systems and the component density continues to …
supercomputers as the scale of these systems and the component density continues to …
ERSA: Error resilient system architecture for probabilistic applications
There is a growing concern about the increasing vulnerability of future computing systems to
errors in the underlying hardware. Traditional redundancy techniques are expensive for …
errors in the underlying hardware. Traditional redundancy techniques are expensive for …