Design challenges of multi-UAV systems in cyber-physical applications: A comprehensive survey and future directions
Unmanned aerial vehicles (UAVs) have recently rapidly grown to facilitate a wide range of
innovative applications that can fundamentally change the way cyber-physical systems …
innovative applications that can fundamentally change the way cyber-physical systems …
Mojim: A reliable and highly-available non-volatile memory system
Next-generation non-volatile memories (NVMs) promise DRAM-like performance,
persistence, and high density. They can attach directly to processors to form non-volatile …
persistence, and high density. They can attach directly to processors to form non-volatile …
Fault-tolerant and transactional stateful serverless workflows
This paper introduces Beldi, a library and runtime system for writing and composing fault-
tolerant and transactional stateful serverless functions. Beldi runs on existing providers and …
tolerant and transactional stateful serverless functions. Beldi runs on existing providers and …
All about eve:{Execute-Verify} replication for {Multi-Core} servers
This paper presents Eve, a new Execute-Verify architecture that allows state machine
replication to scale to multi-core servers. Eve departs from the traditional agree-execute …
replication to scale to multi-core servers. Eve departs from the traditional agree-execute …
Gray failure: The achilles' heel of cloud-scale systems
Cloud scale provides the vast resources necessary to replace failed components, but this is
useful only if those failures can be detected. For this reason, the major availability …
useful only if those failures can be detected. For this reason, the major availability …
{NetBouncer}: Active device and link failure localization in data center networks
The availability of data center services is jeopardized by various network incidents. One of
the biggest challenges for network incident handling is to accurately localize the failures …
the biggest challenges for network incident handling is to accurately localize the failures …
What bugs live in the cloud? a study of 3000+ issues in cloud systems
We conduct a comprehensive study of development and deployment issues of six popular
and important cloud systems (Hadoop MapReduce, HDFS, HBase, Cassandra, ZooKeeper …
and important cloud systems (Hadoop MapReduce, HDFS, HBase, Cassandra, ZooKeeper …
Microsecond consensus for microsecond applications
We consider the problem of making apps fault-tolerant through replication, when apps
operate at the microsecond scale, as in finance, embedded computing, and microservices …
operate at the microsecond scale, as in finance, embedded computing, and microservices …
Understanding and detecting software upgrade failures in distributed systems
Upgrade is one of the most disruptive yet unavoidable maintenance tasks that undermine
the availability of distributed systems. Any failure during an upgrade is catastrophic, as it …
the availability of distributed systems. Any failure during an upgrade is catastrophic, as it …
Perseus: A {Fail-Slow} detection framework for cloud storage systems
The newly-emerging''fail-slow''failures plague both software and hardware where the victim
components are still functioning yet with degraded performance. To address this problem …
components are still functioning yet with degraded performance. To address this problem …