Systems approaches to tackling configuration errors: A survey
In recent years, configuration errors (ie, misconfigurations) have become one of the
dominant causes of system failures, resulting in many severe service outages and …
dominant causes of system failures, resulting in many severe service outages and …
Simple testing can prevent most critical failures: An analysis of production failures in distributed {Data-Intensive} systems
Large, production quality distributed systems still fail periodically, and do so sometimes
catastrophically, where most or all users experience an outage or data loss. We present the …
catastrophically, where most or all users experience an outage or data loss. We present the …
An empirical study on configuration errors in commercial and open source systems
Configuration errors (ie, misconfigurations) are among the dominant causes of system
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …
X-ray: Automating {Root-Cause} diagnosis of performance anomalies in production software
M Attariyan, M Chow, J Flinn - 10th USENIX Symposium on Operating …, 2012 - usenix.org
Troubleshooting the performance of production software is challenging. Most existing tools,
such as profiling, tracing, and logging systems, reveal what events occurred during …
such as profiling, tracing, and logging systems, reveal what events occurred during …
Do not blame users for misconfigurations
Similar to software bugs, configuration errors are also one of the major causes of today's
system failures. Many configuration issues manifest themselves in ways similar to software …
system failures. Many configuration issues manifest themselves in ways similar to software …
Toward fine-grained, unsupervised, scalable performance diagnosis for production cloud computing systems
Performance diagnosis is labor intensive in production cloud computing systems. Such
systems typically face many real-world challenges, which the existing diagnosis techniques …
systems typically face many real-world challenges, which the existing diagnosis techniques …
[PDF][PDF] Automating configuration troubleshooting with dynamic information flow analysis
M Attariyan, J Flinn - 9th USENIX Symposium on Operating Systems …, 2010 - usenix.org
Software misconfigurations are time-consuming and enormously frustrating to troubleshoot.
In this paper, we show that dynamic information flow analysis helps solve these problems by …
In this paper, we show that dynamic information flow analysis helps solve these problems by …
Metastable failures in the wild
Recently, Bronson et al. introduced a framework for understanding a class of failures in
distributed systems called metastable failures. The examples of metastable failures …
distributed systems called metastable failures. The examples of metastable failures …
Challenges and opportunities: an in-depth empirical study on configuration error injection testing
Configuration error injection testing (CEIT) could systematically evaluate software reliability
and diagnosability to runtime configuration errors. This paper explores the challenges and …
and diagnosability to runtime configuration errors. This paper explores the challenges and …
Encore: Exploiting system environment and correlation information for misconfiguration detection
As software systems become more complex and configurable, failures due to
misconfigurations are becoming a critical problem. Such failures often have serious …
misconfigurations are becoming a critical problem. Such failures often have serious …