Fundamentals of fault-tolerant distributed computing in asynchronous environments
FC Gärtner - ACM Computing Surveys (CSUR), 1999 - dl.acm.org
Fault tolerance in distributed computing is a wide area with a significant body of literature
that is vastly diverse in methodology and terminology. This paper aims at structuring the …
that is vastly diverse in methodology and terminology. This paper aims at structuring the …
The failure detector abstraction
A failure detector is a fundamental abstraction in distributed computing. This article surveys
this abstraction through two dimensions. First we study failure detectors as building blocks to …
this abstraction through two dimensions. First we study failure detectors as building blocks to …
Failure detection and consensus in the crash-recovery model
We study the problems of failure detection and consensus in asynchronous systems in
which processes may crash and recover, and links may lose messages. We first propose …
which processes may crash and recover, and links may lose messages. We first propose …
Leader-based consensus
It is now well recognized that consensus is a fundamental problem one has to solve to
implement reliable applications on top of unreliable asynchronous distributed systems prone …
implement reliable applications on top of unreliable asynchronous distributed systems prone …
The generic consensus service
This paper describes a modular approach for the construction of fault-tolerant agreement
protocols. The approach is based on a generic consensus service. Fault-tolerant agreement …
protocols. The approach is based on a generic consensus service. Fault-tolerant agreement …
Consensus system for solving conflicts in distributed systems
NT Nguyen - Information Sciences, 2002 - Elsevier
By a data conflict in a distributed system we understand a situation (or a state of the system)
in which the system sites generate and store different versions of data which represent the …
in which the system sites generate and store different versions of data which represent the …
Failure detection and consensus in the crash-recovery model
We study the problems of failure detection and consensus in asynchronous systems in
which processes may crash and recover, and links may lose messages. We first propose …
which processes may crash and recover, and links may lose messages. We first propose …
Consensus in asynchronous distributed systems: A concise guided tour
It is now recognized that the Consensus problem is a fundamental problem when one has to
design and implement reliable asynchronous distributed systems. This chapter is on the …
design and implement reliable asynchronous distributed systems. This chapter is on the …
Fault-tolerant total order multicast to asynchronous groups
U Fritzke, P Ingels, A Mostéfaoui… - … IEEE Symposium on …, 1998 - ieeexplore.ieee.org
While Total Order Broadcast (or Atomic Broadcast) primitives have received a lot of attention,
the paper concentrates on Total Order Multicast to Multiple Groups in the context of …
the paper concentrates on Total Order Multicast to Multiple Groups in the context of …
On quiescent reliable communication
We study the problem of achieving reliable communication with quiescent algorithms (ie,
algorithms that eventually stop sending messages) in asynchronous systems with process …
algorithms that eventually stop sending messages) in asynchronous systems with process …