An empirical study on change-induced incidents of online service systems

Y Wu, B Chai, Y Li, B Liu, J Li, Y Yang… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Although dedicated efforts have been devoted to ensuring the service quality of online
service systems, these systems are still suffering from incidents due to various causes, which …

ChangeRCA: Finding Root Causes from Software Changes in Large Online Systems

G Yu, P Chen, Z He, Q Yan, Y Luo, F Li… - Proceedings of the ACM …, 2024 - dl.acm.org
In large-scale online service systems, the occurrence of software changes is inevitable and
frequent. Despite rigorous pre-deployment testing practices, the presence of defective …

Identifying erroneous software changes through self-supervised contrastive learning on time series data

X Wang, K Yin, Q Ouyang, X Wen… - 2022 IEEE 33rd …, 2022 - ieeexplore.ieee.org
Software changes are frequent and inevitable. How-ever, erroneous software changes may
cause failures and incidents, degrading user experience and system stability. Thus, it is …

PORCA: Root cause analysis with partially observed data

C Gong, D Yao, J Wang, W Li, L Fang, Y **e… - arxiv preprint arxiv …, 2024 - arxiv.org
Root Cause Analysis (RCA) aims at identifying the underlying causes of system faults by
uncovering and analyzing the causal structure from complex systems. It has been widely …

A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends

T Wang, G Qi - arxiv preprint arxiv:2408.00803, 2024 - arxiv.org
The complex dependencies and propagative faults inherent in microservices, characterized
by a dense network of interconnected services, pose significant challenges in identifying the …

Understanding and Improving Change Risk Detection in Practice

Y Wu, Y Wang, J Li, Y Li, B Chai… - 2024 IEEE International …, 2024 - ieeexplore.ieee.org
Changes are inevitable and frequent in large-scale online service systems, which has been
one of the leading causes that induce incidents. Change risk detection (CRD) aims to help …