[HTML][HTML] Toward exascale resilience: 2014 update
Resilience is a major roadblock for HPC executions on future exascale systems. These
systems will typically gather millions of CPU cores running up to a billion threads …
systems will typically gather millions of CPU cores running up to a billion threads …
Post-failure recovery of MPI communication capability: Design and rationale
As supercomputers are entering an era of massive parallelism where the frequency of faults
is increasing, the MPI Standard remains distressingly vague on the consequence of failures …
is increasing, the MPI Standard remains distressingly vague on the consequence of failures …
An evaluation of user-level failure mitigation support in MPI
As the scale of computing platforms becomes increasingly extreme, the requirements for
application fault tolerance are increasing as well. Techniques to address this problem by …
application fault tolerance are increasing as well. Techniques to address this problem by …
Evaluating and extending user-level fault tolerance in MPI applications
The user-level failure mitigation (ULFM) interface has been proposed to provide fault-
tolerant semantics in the Message Passing Interface (MPI). Previous work presented …
tolerant semantics in the Message Passing Interface (MPI). Previous work presented …
Local rollback for resilient MPI applications with application-level checkpointing and message logging
The resilience approach generally used in high-performance computing (HPC) relies on
coordinated checkpoint/restart, a global rollback of all the processes that are running the …
coordinated checkpoint/restart, a global rollback of all the processes that are running the …
Parallel processing strategies for big geospatial data
M Werner - Frontiers in big Data, 2019 - frontiersin.org
This paper provides an abstract analysis of parallel processing strategies for spatial and
spatio-temporal data. It isolates aspects such as data locality and computational locality as …
spatio-temporal data. It isolates aspects such as data locality and computational locality as …
Evaluating user-level fault tolerance for MPI applications
The User Level Failure Mitigation (ULFM) interface has been proposed to provide fault-
tolerant semantics in MPI. Previous work has presented performance evaluations of the …
tolerant semantics in MPI. Previous work has presented performance evaluations of the …
Twister2: Design of a big data toolkit
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
An evaluation of user-level failure mitigation support in MPI
As the scale of computing platforms becomes increasingly extreme, the requirements for
application fault tolerance are increasing as well. Techniques to address this problem by …
application fault tolerance are increasing as well. Techniques to address this problem by …
Accelerating seismic redatuming using tile low-rank approximations on NEC SX-Aurora TSUBASA
With the aim of imaging subsurface discontinuities, seismic data recorded at the surface of
the Earth must be numerically re-positioned at locations in the subsurface where reflections …
the Earth must be numerically re-positioned at locations in the subsurface where reflections …