Systems approaches to tackling configuration errors: A survey
In recent years, configuration errors (ie, misconfigurations) have become one of the
dominant causes of system failures, resulting in many severe service outages and …
dominant causes of system failures, resulting in many severe service outages and …
Software configuration engineering in practice interviews, survey, and systematic literature review
Modern software applications are adapted to different situations (eg, memory limits,
enabling/disabling features, database credentials) by changing the values of configuration …
enabling/disabling features, database credentials) by changing the values of configuration …
Why does the cloud stop computing? lessons from hundreds of service outages
We conducted a cloud outage study (COS) of 32 popular Internet services. We analyzed
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …
1247 headline news and public post-mortem reports that detail 597 unplanned outages that …
Understanding and detecting real-world performance bugs
Developers frequently use inefficient code sequences that could be fixed by simple patches.
These inefficient code sequences can cause significant performance degradation and …
These inefficient code sequences can cause significant performance degradation and …
Face it yourselves: An llm-based two-stage strategy to localize configuration errors via logs
Configurable software systems are prone to configuration errors, resulting in significant
losses to companies. However, diagnosing these errors is challenging due to the vast and …
losses to companies. However, diagnosing these errors is challenging due to the vast and …
Hey, you have given me too many knobs!: Understanding and dealing with over-designed configuration in system software
Configuration problems are not only prevalent, but also severely impair the reliability of
today's system software. One fundamental reason is the ever-increasing complexity of …
today's system software. One fundamental reason is the ever-increasing complexity of …
X-ray: Automating {Root-Cause} diagnosis of performance anomalies in production software
M Attariyan, M Chow, J Flinn - 10th USENIX Symposium on Operating …, 2012 - usenix.org
Troubleshooting the performance of production software is challenging. Most existing tools,
such as profiling, tracing, and logging systems, reveal what events occurred during …
such as profiling, tracing, and logging systems, reveal what events occurred during …
An empirical study on configuration errors in commercial and open source systems
Configuration errors (ie, misconfigurations) are among the dominant causes of system
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …
failures. Their importance has inspired many research efforts on detecting, diagnosing, and …
libdft: Practical dynamic data flow tracking for commodity systems
Dynamic data flow tracking (DFT) deals with tagging and tracking data of interest as they
propagate during program execution. DFT has been repeatedly implemented by a variety of …
propagate during program execution. DFT has been repeatedly implemented by a variety of …
Fail-slow at scale: Evidence of hardware performance faults in large production systems
Fail-slow hardware is an under-studied failure mode. We present a study of 114 reports of
fail-slow hardware incidents, collected from large-scale cluster deployments in 14 …
fail-slow hardware incidents, collected from large-scale cluster deployments in 14 …