Halfmoon: Log-optimal fault-tolerant stateful serverless computing

S Qi, X Liu, X ** - Proceedings of the 29th Symposium on Operating …, 2023 - dl.acm.org
Serverless computing separates function execution from state management. Simple retry-
based fault tolerance might corrupt the shared state with duplicate updates. Existing …

Simulation for robotics test automation: Developer perspectives

A Afzal, DS Katz, C Le Goues… - 2021 14th IEEE …, 2021 - ieeexplore.ieee.org
Robotics simulation plays an important role in the design, development, and verification and
validation of robotics systems. Simulation represents a potentially cheaper, safer, and more …

What bugs cause production cloud incidents?

H Liu, S Lu, M Musuvathi, S Nath - Proceedings of the Workshop on Hot …, 2019 - dl.acm.org
Cloud services have become the backbone of today's computing world. Runtime incidents,
which adversely affect the expected service operations, are extremely costly in terms of user …

Model checking guided testing for distributed systems

D Wang, W Dou, Y Gao, C Wu, J Wei… - Proceedings of the …, 2023 - dl.acm.org
Distributed systems have become the backbone of cloud computing. Incorrect system
designs and implementations can greatly impair the reliability of distributed systems …

A comprehensive study on real world concurrency bugs in Node. js

J Wang, W Dou, Y Gao, C Gao, F Qin… - 2017 32nd IEEE/ACM …, 2017 - ieeexplore.ieee.org
Node. js becomes increasingly popular in building server-side JavaScript applications. It
adopts an event-driven model, which supports asynchronous I/O and non-deterministic …

An empirical study on crash recovery bugs in large-scale distributed systems

Y Gao, W Dou, F Qin, C Gao, D Wang, J Wei… - Proceedings of the …, 2018 - dl.acm.org
In large-scale distributed systems, node crashes are inevitable, and can happen at any time.
As such, distributed systems are usually designed to be resilient to these node crashes via …

Flymc: Highly scalable testing of complex interleavings in distributed systems

JF Lukman, H Ke, CA Stuardo, RO Suminto… - Proceedings of the …, 2019 - dl.acm.org
We present a fast and scalable testing approach for datacenter/cloud systems such as
Cassandra, Hadoop, Spark, and ZooKeeper. The uniqueness of our approach is in its ability …

Gobench: A benchmark suite of real-world go concurrency bugs

T Yuan, G Li, J Lu, C Liu, L Li… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Go, a fast growing programming language, is often considered as “the programming
language of the cloud”. The language provides a rich set of synchronization primitives …

{FlowDist}:{Multi-Staged}{Refinement-Based} Dynamic Information Flow Analysis for Distributed Software Systems

X Fu, H Cai - 30th USENIX security symposium (USENIX Security 21 …, 2021 - usenix.org
Dynamic information flow analysis (DIFA) supports various security applications such as
malware analysis and vulnerability discovery. Yet traditional DIFA approaches have limited …

Performance bug analysis and detection for distributed storage and computing systems

J Li, Y Zhang, S Lu, HS Gunawi, X Gu… - ACM Transactions on …, 2023 - dl.acm.org
This article systematically studies 99 distributed performance bugs from five widely deployed
distributed storage and computing systems (Cassandra, HBase, HDFS, Hadoop …