Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey

J Soldani, A Brogi - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
The proliferation of services and service interactions within microservices and cloud-native
applications, makes it harder to detect failures and to identify their possible root causes …

A survey on device behavior fingerprinting: Data sources, techniques, application scenarios, and datasets

PMS Sánchez, JMJ Valero, AH Celdrán… - … Surveys & Tutorials, 2021 - ieeexplore.ieee.org
In the current network-based computing world, where the number of interconnected devices
grows exponentially, their diversity, malfunctions, and cybersecurity threats are increasing at …

MST-GAT: A multimodal spatial–temporal graph attention network for time series anomaly detection

C Ding, S Sun, J Zhao - Information Fusion, 2023 - Elsevier
Multimodal time series (MTS) anomaly detection is crucial for maintaining the safety and
stability of working devices (eg, water treatment system and spacecraft), whose data are …

Deeptralog: Trace-log combined microservice anomaly detection through graph-based deep learning

C Zhang, X Peng, C Sha, K Zhang, Z Fu, X Wu… - Proceedings of the 44th …, 2022 - dl.acm.org
A microservice system in industry is usually a large-scale distributed system consisting of
dozens to thousands of services running in different machines. An anomaly of the system …

Eadro: An end-to-end troubleshooting framework for microservices on multi-source data

C Lee, T Yang, Z Chen, Y Su… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
The complexity and dynamism of microservices pose significant challenges to system
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …

Unsupervised detection of microservice trace anomalies through service-level deep bayesian networks

P Liu, H Xu, Q Ouyang, R Jiao, Z Chen… - 2020 IEEE 31st …, 2020 - ieeexplore.ieee.org
The anomalies of microservice invocation traces (traces) often indicate that the quality of the
microservice-based large software service is being impaired. However, timely and …

Robust multimodal failure detection for microservice systems

C Zhao, M Ma, Z Zhong, S Zhang, Z Tan… - Proceedings of the 29th …, 2023 - dl.acm.org
Proactive failure detection of instances is vitally essential to microservice systems because
an instance failure can propagate to the whole system and degrade the system's …

A semi-supervised VAE based active anomaly detection framework in multivariate time series for online systems

T Huang, P Chen, R Li - Proceedings of the ACM Web Conference 2022, 2022 - dl.acm.org
Nowadays, the large online systems are constructed on the basis of microservice
architecture. A failure in this architecture may cause a series of failures due to the fault …

Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges

Q Cheng, D Sahoo, A Saha, W Yang, C Liu… - arxiv preprint arxiv …, 2023 - arxiv.org
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …

Identifying bad software changes via multimodal anomaly detection for online service systems

N Zhao, J Chen, Z Yu, H Wang, J Li, B Qiu… - Proceedings of the 29th …, 2021 - dl.acm.org
In large-scale online service systems, software changes are inevitable and frequent. Due to
importing new code or configurations, changes are likely to incur incidents and destroy user …