Fault tolerance of MPI applications in exascale systems: The ULFM solution

N Losada, P González, MJ Martín, G Bosilca… - Future Generation …, 2020 - Elsevier
The growth in the number of computational resources used by high-performance computing
(HPC) systems leads to an increase in failure rates. Fault-tolerant techniques will become …

Challenges in develo** mpi fault-tolerant fortran applications

N Weeks, G Luecke, P Maris… - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Powerful high performance computing systems of the future are expected to have higher
failure rates than current systems. As a result, HPC applications running on such future …

[PDF][PDF] Efficient Application-level Fault Tolerance Methods for Large Scale HPC Applications

F Shahzad - 2022 - opus4.kobv.de
Over the last decade, the continuous increase in supercomputers' computing power has
been made possible mainly due to the high degree of parallelism at the component level …

Towards integration of fault tolerance in agent-based systems

A Haqiq, B Bounabat - Procedia Computer Science, 2018 - Elsevier
The distributed system based on agent entities is a system that is subjected to failures
because of the complexity that characterizes the structure of each entity and to the various …

Flexibility measurement model of multi-agent systems

R Benaboud, T Marir - Multiagent and Grid Systems, 2020 - content.iospress.com
Flexibility is considered as one of the key objectives of agent-based technology. Despite
this, we still lack a fundamental understanding of just what “flexibility in multi-agent system …

[PDF][PDF] Efficient Application-level Fault Tolerance Methods for Large Scale HPC Applications Effiziente Methoden für Fehlertoleranz auf Anwendungsebene für

F Shahzad - opus4.kobv.de
In den letzten zehn Jahren wurde das kontinuierliche Wachstum der Rechenleistung von
Supercomputern hauptsächlich aufgrund der hohen Parallelität auf der Komponentenebene …

[PDF][PDF] The Initial Global Conference on Perceptual Computing in Data Sciences Moving Forward with Including Fault Tolerance in Agent-based Systems

SK Patra, PB Nayak - journal-dogorangsang.in
Because of the intricacy of each entity's structure and the many ways in which the multiple
agents interact with one another, distributed systems based on agent entities are prone to …

Refining Fortran Failed Images

N Weeks, G Luecke, G Prabhu - 2020 IEEE/ACM Fifth …, 2020 - ieeexplore.ieee.org
The Fortran 2018 standard introduced syntax and semantics that allow a parallel application
to recover from failed images (fail-stop processes) during execution. Teams are a key new …