Total order broadcast and multicast algorithms: Taxonomy and survey

X Défago, A Schiper, P Urbán - ACM Computing Surveys (CSUR), 2004 - dl.acm.org
Total order broadcast and multicast (also called atomic broadcast/multicast) present an
important problem in distributed systems, especially with respect to fault-tolerance. In short …

'Cause I'm strong enough: Reasoning about consistency choices in distributed systems

A Gotsman, H Yang, C Ferreira, M Najafzadeh… - Proceedings of the 43rd …, 2016 - dl.acm.org
Large-scale distributed systems often rely on replicated databases that allow a programmer
to request different data consistency guarantees for different operations, and thereby control …

P4xos: Consensus as a network service

HT Dang, P Bressana, H Wang, KS Lee… - IEEE/ACM …, 2020 - ieeexplore.ieee.org
In this paper, we explore how a programmable forwarding plane offered by a new breed of
network switches might naturally accelerate consensus protocols, specifically focusing on …

State-machine replication for planet-scale systems

V Enes, C Baquero, TF Rezende, A Gotsman… - Proceedings of the …, 2020 - dl.acm.org
Online applications now routinely replicate their data at multiple sites around the world. In
this paper we present Atlas, the first state-machine replication protocol tailored for such …

The failure detector abstraction

FC Freiling, R Guerraoui, P Kuznetsov - ACM Computing Surveys …, 2011 - dl.acm.org
A failure detector is a fundamental abstraction in distributed computing. This article surveys
this abstraction through two dimensions. First we study failure detectors as building blocks to …

Advanced pattern recognition for detection of complex software aging phenomena in online transaction processing servers

KJ Cassidy, KC Gross… - … on dependable systems …, 2002 - ieeexplore.ieee.org
Software aging phenomena have been recently studied; one particularly complex type is
shared memory pool latch contention in large OLTP servers. Latch contention onset leads to …

Rex: Replication at the speed of multi-core

Z Guo, C Hong, M Yang, D Zhou, L Zhou… - Proceedings of the Ninth …, 2014 - dl.acm.org
Standard state-machine replication involves consensus on a sequence of totally ordered
requests through, for example, the Paxos protocol. Such a sequential execution model is …

Rethinking state-machine replication for parallelism

PJ Marandi, CE Bezerra… - 2014 IEEE 34th …, 2014 - ieeexplore.ieee.org
State-machine replication, a fundamental approach to designing fault-tolerant services,
requires commands to be executed in the same order by all replicas. Moreover, command …

Handling message semantics with generic broadcast protocols

F Pedone, A Schiper - Distributed Computing, 2002 - Springer
Message ordering is a fundamental abstraction in distributed systems. However, ordering
guarantees are usually purely “syntactic,” that is, message “semantics” is not taken into …

{SwiftPaxos}: Fast {Geo-Replicated} State Machines

F Ryabinin, A Gotsman, P Sutra - 21st USENIX Symposium on …, 2024 - usenix.org
Cloud services improve their availability by replicating data across sites in different
geographical regions. A variety of state-machine replication protocols have been proposed …