Halfmoon: Log-optimal fault-tolerant stateful serverless computing
Serverless computing separates function execution from state management. Simple retry-
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
Greybox fuzzing of distributed systems
Grey-box fuzzing is the lightweight approach of choice for finding bugs in sequential
programs. It provides a balance between efficiency and effectiveness by conducting a …
programs. It provides a balance between efficiency and effectiveness by conducting a …
Model checking guided testing for distributed systems
Distributed systems have become the backbone of cloud computing. Incorrect system
designs and implementations can greatly impair the reliability of distributed systems …
designs and implementations can greatly impair the reliability of distributed systems …
Automatic reliability testing for cluster management controllers
Modern cluster managers like Borg, Omega and Kubernetes rely on the state-reconciliation
principle to be highly resilient and extensible. In these systems, all cluster-management …
principle to be highly resilient and extensible. In these systems, all cluster-management …
Gobench: A benchmark suite of real-world go concurrency bugs
Go, a fast growing programming language, is often considered as “the programming
language of the cloud”. The language provides a rich set of synchronization primitives …
language of the cloud”. The language provides a rich set of synchronization primitives …
Service-level fault injection testing
Companies today increasingly rely on microservice architectures to deliver service for their
large-scale mobile or web applications. However, not all developers working on these …
large-scale mobile or web applications. However, not all developers working on these …
Compiling distributed system models with PGo
F Hackett, S Hosseini, R Costa, M Do… - Proceedings of the 28th …, 2023 - dl.acm.org
Distributed systems are difficult to design and implement correctly. In response, both
research and industry are exploring applications of formal methods to distributed systems. A …
research and industry are exploring applications of formal methods to distributed systems. A …
CoFI: Consistency-guided fault injection for cloud systems
Network partitions are inevitable in large-scale cloud systems. Despite developer's efforts in
handling network partitions throughout designing, implementing and testing cloud systems …
handling network partitions throughout designing, implementing and testing cloud systems …
SandTable: Scalable Distributed System Model Checking with Specification-Level State Exploration
Implementation-level distributed system model checkers (DMCKs) have proven valuable in
verifying the correctness of real distributed systems. However, they primarily focus on state …
verifying the correctness of real distributed systems. However, they primarily focus on state …
Chronos: Finding timeout bugs in practical distributed systems by deep-priority fuzzing with transient delay
Delays are inevitable in complex distributed environments. Timeout mechanisms are
commonly used to handle unexpected failures in distributed systems. However, incorrect …
commonly used to handle unexpected failures in distributed systems. However, incorrect …