- Academic Search

[HTML][HTML] A taxonomy of task-based parallel programming technologies for high-performance computing

P Thoman, K Dichev, T Heller, R Iakymchuk… - The Journal of …, 2018 - Springer

Task-based programming models for shared memory—such as Cilk Plus and OpenMP 3—
are well established and documented. However, with the increase in parallel, many-core …

Gem Citer Citeret af 199 Relaterede artikler Alle 12 versioner

[Free GPT-4]
[DeepSeek]

[PDF] semanticscholar.org

Self-stabilizing iterative solvers

P Sao, R Vuduc - Proceedings of the workshop on latest advances in …, 2013 - dl.acm.org

We show how to use the idea of self-stabilization, which originates in the context of
distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system …

Gem Citer Citeret af 124 Relaterede artikler Alle 4 versioner

[Free GPT-4]
[DeepSeek]

[PDF] wiley.com

Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems

J Chung, I Lee, M Sullivan, JH Ryoo… - Scientific …, 2013 - content.iospress.com

This paper describes and evaluates a scalable and efficient resilience scheme based on the
concept of containment domains. Containment domains are a programming construct that …

Gem Citer Citeret af 134 Relaterede artikler Alle 22 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Resiliency in numerical algorithm design for extreme scale simulations

E Agullo, M Altenbernd, H Anzt… - … Journal of High …, 2022 - journals.sagepub.com

This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …

Gem Citer Citeret af 13 Relaterede artikler Alle 22 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Silent error detection in numerical time-step** schemes

AR Benson, S Schmit… - The International Journal …, 2015 - journals.sagepub.com

Errors due to hardware or low-level software problems, if detected, can be fixed by various
schemes, such as recomputation from a checkpoint. Silent errors are errors in application …

Gem Citer Citeret af 96 Relaterede artikler Alle 11 versioner

[Free GPT-4]
[DeepSeek]

[PDF] sagepub.com

Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction

T Benacchio, L Bonaventura… - … Journal of High …, 2021 - journals.sagepub.com

Progress in numerical weather and climate prediction accuracy greatly depends on the
growth of the available computing power. As the number of cores in top computing facilities …

Gem Citer Citeret af 21 Relaterede artikler Alle 14 versioner

[Free GPT-4]
[DeepSeek]

[PDF] uchicago.edu

When is multi-version checkpointing needed?

G Lu, Z Zheng, AA Chien - Proceedings of the 3rd Workshop on Fault …, 2013 - dl.acm.org

The scaling of semiconductor technology and increasing power concerns combined with
system scale make fault management a growing concern in high performance computing …

Gem Citer Citeret af 75 Relaterede artikler Alle 2 versioner

[Free GPT-4]
[DeepSeek]

[PDF] ncsu.edu

[PDF][PDF] Quantifying the impact of single bit flips on floating point arithmetic

J Elliott, F Mueller, F Stoyanov, C Webster - 2013 - repository.lib.ncsu.edu

In high-end computing, the collective surface area, smaller fabrication sizes, and increasing
density of components have led to an increase in the number of observed bit flips. Such flips …

Gem Citer Citeret af 69 Relaterede artikler Alle 9 versioner Vis som HTML

[Free GPT-4]
[DeepSeek]

[PDF] upc.edu

Exploiting asynchrony from exact forward recovery for due in iterative solvers

L Jaulmes, M Casas, M Moretó, E Ayguadé… - Proceedings of the …, 2015 - dl.acm.org

This paper presents a method to protect iterative solvers from Detected and Uncorrected
Errors (DUE) relying on error detection techniques already available in commodity …

Gem Citer Citeret af 45 Relaterede artikler Alle 10 versioner

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Shrink or substitute: handling process failures in HPC systems using in-situ recovery

RA Ashraf, S Hukerikar… - 2018 26th Euromicro …, 2018 - ieeexplore.ieee.org

Efficient utilization of today's high-performance computing (HPC) systems with complex
software and hardware components requires that the HPC applications are designed to …

Gem Citer Citeret af 31 Relaterede artikler Alle 16 versioner

Opret underretning

Citer

Avanceret søgning

Gemt i Min samling

Fault-tolerant iterative methods.

[HTML][HTML] A taxonomy of task-based parallel programming technologies for high-performance computing

Self-stabilizing iterative solvers

Containment domains: A scalable, efficient and flexible resilience scheme for exascale systems

Resiliency in numerical algorithm design for extreme scale simulations

Silent error detection in numerical time-step** schemes

Resilience and fault tolerance in high-performance computing for numerical weather and climate prediction

When is multi-version checkpointing needed?

[PDF][PDF] Quantifying the impact of single bit flips on floating point arithmetic

Exploiting asynchrony from exact forward recovery for due in iterative solvers

Shrink or substitute: handling process failures in HPC systems using in-situ recovery