Resiliency in numerical algorithm design for extreme scale simulations
This work is based on the seminar titled 'Resiliency in Numerical Algorithm Design for
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Extreme Scale Simulations' held March 1–6, 2020, at Schloss Dagstuhl, that was attended …
Resilience for massively parallel multigrid solvers
Fault tolerant massively parallel multigrid methods for elliptic partial differential equations
are a step towards resilient solvers. Here, we combine domain partitioning with geometric …
are a step towards resilient solvers. Here, we combine domain partitioning with geometric …
Recent developments in the theory and application of the sparse grid combination technique
Substantial modifications of both the choice of the grids, the combination coefficients, the
parallel data structures and the algorithms used for the combination technique lead to …
parallel data structures and the algorithms used for the combination technique lead to …
Complex scientific applications made fault-tolerant with the sparse grid combination technique
Ultra-large–scale simulations via solving partial differential equations (PDEs) require very
large computational systems for their timely solution. Studies shown the rate of failure grows …
large computational systems for their timely solution. Studies shown the rate of failure grows …
A highly scalable, algorithm-based fault-tolerant solver for gyrokinetic plasma simulations
With future exascale computers expected to have millions of compute units distributed
among thousands of nodes, system faults are predicted to become more frequent. Fault …
among thousands of nodes, system faults are predicted to become more frequent. Fault …
[PDF][PDF] A massively parallel combination technique for the solution of high-dimensional PDEs
M Heene - 2018 - core.ac.uk
The solution of high-dimensional problems, especially high-dimensional partial differential
equations (PDEs) that require the joint discretization of more than the usual three spatial …
equations (PDEs) that require the joint discretization of more than the usual three spatial …
Fault-Tolerant Parallel Multigrid Method on Unstructured Adaptive Mesh
As the generation of exascale high-performance clusters begins, it has become evident that
numerical algorithms will greatly benefit from built-in resilience features that can handle …
numerical algorithms will greatly benefit from built-in resilience features that can handle …
[PDF][PDF] EXAHD: a massively parallel fault tolerant sparse grid approach for high-dimensional turbulent plasma simulations
R Lago, M Obersteiner, T Pollinger… - Software for Exascale …, 2020 - library.oapen.org
Plasma fusion is one of the promising candidates for an emission-free energy source and is
heavily investigated with high-resolution numerical simulations. Unfortunately, these …
heavily investigated with high-resolution numerical simulations. Unfortunately, these …
Handling silent data corruption with the sparse grid combination technique
We describe two algorithms to detect and filter silent data corruption (SDC) when solving
time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT …
time-dependent PDEs with the Sparse Grid Combination Technique (SGCT). The SGCT …
A spatially adaptive and massively parallel implementation of the fault-tolerant combination technique
MJ Obersteiner - 2021 - mediatum.ub.tum.de
In this work, we discuss measures to increase the scalability, robustness, and efficiency of
the Combination Technique. In particular, we introduce an asynchronous variant and …
the Combination Technique. In particular, we introduce an asynchronous variant and …