A survey of rollback-recovery protocols in message-passing systems

EN Elnozahy, L Alvisi, YM Wang… - ACM Computing Surveys …, 2002 - dl.acm.org
This survey covers rollback-recovery techniques that do not require special language
constructs. In the first part of the survey we classify rollback-recovery protocols into …

[BUKU][B] Distributed algorithms for message-passing systems

M Raynal - 2013 - Springer
Distributed Algorithms for Message-Passing Systems Page 1 Michel Raynal Distributed Algorithms
for Message-Passing Systems Page 2 Distributed Algorithms for Message-Passing Systems Page …

A survey of checkpointing algorithms for parallel and distributed computers

S Kalaiselvi, V Rajaraman - Sadhana, 2000 - Springer
Checkpoint is defined as a designated place in a program at which normal processing is
interrupted specifically to preserve the status information necessary to allow resumption of …

[BUKU][B] Distributed system design

J Wu - 2017 - taylorfrancis.com
Future requirements for computing speed, system reliability, and cost-effectiveness entail the
development of alternative computers to replace the traditional von Neumann organization …

[BUKU][B] Introduction to reversible computing

KS Perumalla - 2013 - books.google.com
Few books comprehensively cover the software and programming aspects of reversible
computing. Filling this gap, Introduction to Reversible Computing offers an expanded view of …

Mutable checkpoints: a new checkpointing approach for mobile computing systems

G Cao, M Singhal - IEEE Transactions on Parallel and …, 2001 - ieeexplore.ieee.org
Mobile computing raises many new issues such as lack of stable storage, low bandwidth of
wireless channel, high mobility, and limited battery life. These new issues make traditional …

Quasi-synchronous checkpointing: Models, characterization, and classification

D Manivannan, M Singhal - IEEE Transactions on Parallel and …, 1999 - ieeexplore.ieee.org
Checkpointing algorithms are classified as synchronous and asynchronous in the literature.
In synchronous checkpointing, processes synchronize their checkpointing activities so that a …

Communication-based prevention of useless checkpoints in distributed computations

JM Hélary, A Mostefaoui, RHB Netzer, M Raynal - Distributed Computing, 2000 - Springer
A useless checkpoint is a local checkpoint that cannot be part of a consistent global
checkpoint. This paper addresses the following problem. Given a set of processes that take …

[BUKU][B] Design and analysis of reliable and fault-tolerant computer systems

MI Abd-el-barr - 2006 - books.google.com
Covering both the theoretical and practical aspects of fault-tolerant mobile systems, and fault
tolerance and analysis, this book tackles the current issues of reliability-based optimization …

Communication-induced determination of consistent snapshots

J Helary, A Mostefaoui, M Raynal - IEEE Transactions on …, 1999 - ieeexplore.ieee.org
A classical way to determine consistent snapshots consists in using Chandy-Lamport's
algorithm. This algorithm relies on specific control messages that allow processes to …