Failure diagnosis in microservice systems: A comprehensive survey and analysis

S Zhang, S **a, W Fan, B Shi, X **ong… - ACM Transactions on …, 2024 - dl.acm.org
Widely adopted for their scalability and flexibility, modern microservice systems present
unique failure diagnosis challenges due to their independent deployment and dynamic …

CloudRCA: A root cause analysis framework for cloud computing platforms

Y Zhang, Z Guan, H Qian, L Xu, H Liu, Q Wen… - Proceedings of the 30th …, 2021 - dl.acm.org
As business of Alibaba expands across the world among various industries, higher
standards are imposed on the service quality and reliability of big data cloud computing …

Root cause analysis of anomalies of multitier services in public clouds

J Weng, JH Wang, J Yang… - IEEE/ACM Transactions on …, 2018 - ieeexplore.ieee.org
Anomalies of multitier services of one tenant running in cloud platform can be caused by the
tenant's own components or performance interference from other tenants. If the performance …

Real-time analysis of multiple root causes for anomalies assisted by digital twin in NFV environment

W Wang, L Tang, C Wang… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Network Function Virtualization (NFV) is a promising paradigm that enables the employment
of novel service types with lower deployment cost and faster time-to-value, but it introduces …

TLS-WGAN-GP: A generative adversarial network model for data-driven fault root cause location

S Xu, X Xu, H Gao, F **ao - IEEE Transactions on Consumer …, 2023 - ieeexplore.ieee.org
Data-driven intelligent fault root cause location is important to the reliability and safety of
network operation and maintenance. However, the number of fault samples is much greater …

Knowledge guided hierarchical multi-label classification over ticket data

C Zeng, W Zhou, T Li, L Shwartz… - IEEE Transactions on …, 2017 - ieeexplore.ieee.org
Maximal automation of routine IT maintenance procedures is an ultimate goal of IT service
management. System monitoring, an effective and reliable means for IT problem detection …

RADS: Real-time anomaly detection system for cloud data centres

S Barbhuiya, Z Papazachos, P Kilpatrick… - arxiv preprint arxiv …, 2018 - arxiv.org
Cybersecurity attacks in Cloud data centres are increasing alongside the growth of the
Cloud services market. Existing research proposes a number of anomaly detection systems …

Root cause analysis of noisy neighbors in a virtualized infrastructure

H Bouattour, YB Slimen, M Mechteri… - 2020 IEEE Wireless …, 2020 - ieeexplore.ieee.org
This paper proposes a model to identify the noise source in a virtualized infrastructure. This
phenomenon appears when network functions running under virtual machines that are …

Root cause detection using dynamic dependency graphs from time series data

SY Shah, XH Dang, P Zerfos - 2018 IEEE International …, 2018 - ieeexplore.ieee.org
Change detection in system behavior and its root cause detection is essential for many large-
scale systems such as, manufacturing plants, in order to keep systems running uninterrupted …

Automated traces-based anomaly detection and root cause analysis in cloud platforms

M Soualhia, F Wuhib - 2022 IEEE International Conference on …, 2022 - ieeexplore.ieee.org
Current cloud infrastructures and their applications are increasingly complex, with
confounding relationships among application elements and cloud infrastructure …