Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey
The proliferation of services and service interactions within microservices and cloud-native
applications, makes it harder to detect failures and to identify their possible root causes …
applications, makes it harder to detect failures and to identify their possible root causes …
Ai for it operations (aiops) on cloud platforms: Reviews, opportunities and challenges
Artificial Intelligence for IT operations (AIOps) aims to combine the power of AI with the big
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …
data generated by IT Operations processes, particularly in cloud infrastructures, to provide …
Deeptralog: Trace-log combined microservice anomaly detection through graph-based deep learning
A microservice system in industry is usually a large-scale distributed system consisting of
dozens to thousands of services running in different machines. An anomaly of the system …
dozens to thousands of services running in different machines. An anomaly of the system …
Root cause analysis of failures in microservices through causal discovery
Most cloud applications use a large number of smaller sub-components (called
microservices) that interact with each other in the form of a complex graph to provide the …
microservices) that interact with each other in the form of a complex graph to provide the …
Eadro: An end-to-end troubleshooting framework for microservices on multi-source data
The complexity and dynamism of microservices pose significant challenges to system
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …
{CRISP}: Critical path analysis of {Large-Scale} microservice architectures
Microservice architectures have become the lifeblood of modern service-oriented software
systems. Remote Procedure Calls (RPCs) among microservices are deeply nested …
systems. Remote Procedure Calls (RPCs) among microservices are deeply nested …
Practical root cause localization for microservice systems via trace analysis
Microservice architecture is applied by an increasing number of systems because of its
benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root …
benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root …
Identifying bad software changes via multimodal anomaly detection for online service systems
In large-scale online service systems, software changes are inevitable and frequent. Due to
importing new code or configurations, changes are likely to incur incidents and destroy user …
importing new code or configurations, changes are likely to incur incidents and destroy user …
Actionable and interpretable fault localization for recurring failures in online service systems
Fault localization is challenging in an online service system due to its monitoring data's large
volume and variety and complex dependencies across/within its components (eg, services …
volume and variety and complex dependencies across/within its components (eg, services …
Timeautoad: Autonomous anomaly detection with self-supervised contrastive loss for multivariate time series
Multivariate time series (MTS) data are becoming increasingly ubiquitous in networked
systems, eg, IoT systems and 5G networks. Anomaly detection in MTS refers to identifying …
systems, eg, IoT systems and 5G networks. Anomaly detection in MTS refers to identifying …