Halfmoon: Log-optimal fault-tolerant stateful serverless computing
Serverless computing separates function execution from state management. Simple retry-
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
based fault tolerance might corrupt the shared state with duplicate updates. Existing …
Simulation for robotics test automation: Developer perspectives
Robotics simulation plays an important role in the design, development, and verification and
validation of robotics systems. Simulation represents a potentially cheaper, safer, and more …
validation of robotics systems. Simulation represents a potentially cheaper, safer, and more …
What bugs cause production cloud incidents?
Cloud services have become the backbone of today's computing world. Runtime incidents,
which adversely affect the expected service operations, are extremely costly in terms of user …
which adversely affect the expected service operations, are extremely costly in terms of user …
Model checking guided testing for distributed systems
Distributed systems have become the backbone of cloud computing. Incorrect system
designs and implementations can greatly impair the reliability of distributed systems …
designs and implementations can greatly impair the reliability of distributed systems …
A comprehensive study on real world concurrency bugs in Node. js
Node. js becomes increasingly popular in building server-side JavaScript applications. It
adopts an event-driven model, which supports asynchronous I/O and non-deterministic …
adopts an event-driven model, which supports asynchronous I/O and non-deterministic …
An empirical study on crash recovery bugs in large-scale distributed systems
In large-scale distributed systems, node crashes are inevitable, and can happen at any time.
As such, distributed systems are usually designed to be resilient to these node crashes via …
As such, distributed systems are usually designed to be resilient to these node crashes via …
Flymc: Highly scalable testing of complex interleavings in distributed systems
We present a fast and scalable testing approach for datacenter/cloud systems such as
Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability …
Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability …
Gobench: A benchmark suite of real-world go concurrency bugs
Go, a fast growing programming language, is often considered as “the programming
language of the cloud”. The language provides a rich set of synchronization primitives …
language of the cloud”. The language provides a rich set of synchronization primitives …
{FlowDist}:{Multi-Staged}{Refinement-Based} Dynamic Information Flow Analysis for Distributed Software Systems
Dynamic information flow analysis (DIFA) supports various security applications such as
malware analysis and vulnerability discovery. Yet traditional DIFA approaches have limited …
malware analysis and vulnerability discovery. Yet traditional DIFA approaches have limited …
Performance bug analysis and detection for distributed storage and computing systems
This article systematically studies 99 distributed performance bugs from five widely deployed
distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop …
distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop …