Fault tolerance of MPI applications in exascale systems: The ULFM solution
The growth in the number of computational resources used by high-performance computing
(HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become …
(HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become …
Challenges in develo** mpi fault-tolerant fortran applications
N Weeks, G Luecke, P Maris… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Powerful high performance computing systems of the future are expected to have higher
failure rates than current systems. As a result, HPC applications running on such future …
failure rates than current systems. As a result, HPC applications running on such future …
[PDF][PDF] Efficient Application-level Fault Tolerance Methods for Large Scale HPC Applications
F Shahzad - 2022 - opus4.kobv.de
Over the last decade, the continuous increase in supercomputers' computing power has
been made possible mainly due to the high degree of parallelism at the component level …
been made possible mainly due to the high degree of parallelism at the component level …
Towards integration of fault tolerance in agent-based systems
The distributed system based on agent entities is a system that is subjected to failures
because of the complexity that characterizes the structure of each entity and to the various …
because of the complexity that characterizes the structure of each entity and to the various …
Flexibility measurement model of multi-agent systems
Flexibility is considered as one of the key objectives of agent-based technology. Despite
this, we still lack a fundamental understanding of just what “flexibility in multi-agent system …
this, we still lack a fundamental understanding of just what “flexibility in multi-agent system …
[PDF][PDF] Efficient Application-level Fault Tolerance Methods for Large Scale HPC Applications Effiziente Methoden für Fehlertoleranz auf Anwendungsebene für
F Shahzad - opus4.kobv.de
In den letzten zehn Jahren wurde das kontinuierliche Wachstum der Rechenleistung von
Supercomputern hauptsächlich aufgrund der hohen Parallelität auf der Komponentenebene …
Supercomputern hauptsächlich aufgrund der hohen Parallelität auf der Komponentenebene …
[PDF][PDF] The Initial Global Conference on Perceptual Computing in Data Sciences Moving Forward with Including Fault Tolerance in Agent-based Systems
SK Patra, PB Nayak - journal-dogorangsang.in
Because of the intricacy of each entity's structure and the many ways in which the multiple
agents interact with one another, distributed systems based on agent entities are prone to …
agents interact with one another, distributed systems based on agent entities are prone to …
Refining Fortran Failed Images
N Weeks, G Luecke, G Prabhu - 2020 IEEE/ACM Fifth …, 2020 - ieeexplore.ieee.org
The Fortran 2018 standard introduced syntax and semantics that allow a parallel application
to recover from failed images (fail-stop processes) during execution. Teams are a key new …
to recover from failed images (fail-stop processes) during execution. Teams are a key new …