Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey

J Soldani, A Brogi - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
The proliferation of services and service interactions within microservices and cloud-native
applications, makes it harder to detect failures and to identify their possible root causes …

AI-enabled secure microservices in edge computing: Opportunities and challenges

F Al-Doghman, N Moustafa, I Khalil… - IEEE Transactions …, 2022 - ieeexplore.ieee.org
The paradigm of edge computing has formed an innovative scope within the domain of the
Internet of Things (IoT) through expanding the services of the cloud to the network edge to …

Semi-supervised log-based anomaly detection via probabilistic label estimation

L Yang, J Chen, Z Wang, W Wang… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
With the growth of software systems, logs have become an important data to aid system
maintenance. Log-based anomaly detection is one of the most important methods for such …

Deeptralog: Trace-log combined microservice anomaly detection through graph-based deep learning

C Zhang, X Peng, C Sha, K Zhang, Z Fu, X Wu… - Proceedings of the 44th …, 2022 - dl.acm.org
A microservice system in industry is usually a large-scale distributed system consisting of
dozens to thousands of services running in different machines. An anomaly of the system …

A survey on automated log analysis for reliability engineering

S He, P He, Z Chen, T Yang, Y Su, MR Lyu - ACM computing surveys …, 2021 - dl.acm.org
Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …

Eadro: An end-to-end troubleshooting framework for microservices on multi-source data

C Lee, T Yang, Z Chen, Y Su… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
The complexity and dynamism of microservices pose significant challenges to system
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …

Microhecl: High-efficient root cause localization in large-scale microservice systems

D Liu, C He, X Peng, F Lin, C Zhang… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
Availability issues of industrial microservice systems (eg, drop of successfully placed orders
and processed transactions) directly affect the running of the business. These issues are …

Experience report: Deep learning-based system log analysis for anomaly detection

Z Chen, J Liu, W Gu, Y Su, MR Lyu - arxiv preprint arxiv:2107.05908, 2021 - arxiv.org
Logs have been an imperative resource to ensure the reliability and continuity of many
software systems, especially large-scale distributed systems. They faithfully record runtime …

Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks

P Liu, H Xu, Q Ouyang, R Jiao, Z Chen… - 2020 IEEE 31st …, 2020 - ieeexplore.ieee.org
The anomalies of microservice invocation traces (traces) often indicate that the quality of the
microservice-based large software service is being impaired. However, timely and …

Practical root cause localization for microservice systems via trace analysis

Z Li, J Chen, R Jiao, N Zhao, Z Wang… - 2021 IEEE/ACM 29th …, 2021 - ieeexplore.ieee.org
Microservice architecture is applied by an increasing number of systems because of its
benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root …