Applications of statistical causal inference in software engineering

J Siebert - Information and Software Technology, 2023 - Elsevier
Context: The aim of statistical causal inference (SCI) methods is to estimate causal effects
from observational data (ie, when randomized controlled trials are not possible). In this …

Failure diagnosis in microservice systems: A comprehensive survey and analysis

S Zhang, S **a, W Fan, B Shi, X **ong… - ACM Transactions on …, 2024 - dl.acm.org
Widely adopted for their scalability and flexibility, modern microservice systems present
unique failure diagnosis challenges due to their independent deployment and dynamic …

Root cause analysis of failures in microservices through causal discovery

A Ikram, S Chakraborty, S Mitra… - Advances in …, 2022 - proceedings.neurips.cc
Most cloud applications use a large number of smaller sub-components (called
microservices) that interact with each other in the form of a complex graph to provide the …

Actionable and interpretable fault localization for recurring failures in online service systems

Z Li, N Zhao, M Li, X Lu, L Wang, D Chang… - Proceedings of the 30th …, 2022 - dl.acm.org
Fault localization is challenging in an online service system due to its monitoring data's large
volume and variety and complex dependencies across/within its components (eg, services …

Autonomous selection of the fault classification models for diagnosing microservice applications

Y Song, R **n, P Chen, R Zhang, J Chen… - Future Generation …, 2024 - Elsevier
Microservices architecture is a new approach for deploying applications and services in the
cloud, gaining popularity for constructing large-scale systems that are highly resilient, robust …

[HTML][HTML] Causalrca: Causal inference based precise fine-grained root cause localization for microservice applications

R **n, P Chen, Z Zhao - Journal of Systems and Software, 2023 - Elsevier
Effectively localizing root causes of performance anomalies is crucial to enabling the rapid
recovery and loss mitigation of microservice applications in the cloud. Depending on the …

Nezha: Interpretable fine-grained root causes analysis for microservices on multi-modal observability data

G Yu, P Chen, Y Li, H Chen, X Li, Z Zheng - Proceedings of the 31st …, 2023 - dl.acm.org
Root cause analysis (RCA) in large-scale microservice systems is a critical and challenging
task. To understand and localize root causes of unexpected faults, modern observability …

Multivariate Log-based Anomaly Detection for Distributed Database

L Zhang, T Jia, M Jia, Y Li, Y Yang, Z Wu - Proceedings of the 30th ACM …, 2024 - dl.acm.org
Distributed databases are fundamental infrastructures of today's large-scale software
systems such as cloud systems. Detecting anomalies in distributed databases is essential …

Baro: Robust root cause analysis for microservices via multivariate bayesian online change point detection

L Pham, H Ha, H Zhang - Proceedings of the ACM on Software …, 2024 - dl.acm.org
Detecting failures and identifying their root causes promptly and accurately is crucial for
ensuring the availability of microservice systems. A typical failure troubleshooting pipeline …

HeMiRCA: Fine-grained root cause analysis for microservices with heterogeneous data sources

Z Zhu, C Lee, X Tang, P He - ACM Transactions on Software …, 2024 - dl.acm.org
Microservices architecture improves software scalability, resilience, and agility but also
poses significant challenges to system reliability due to their complexity and dynamic nature …