[PDF][PDF] Loganomaly: Unsupervised detection of sequential and quantitative anomalies in unstructured logs.

W Meng, Y Liu, Y Zhu, S Zhang, D Pei, Y Liu, Y Chen… - IJCAI, 2019 - nkcs.iops.ai
Recording runtime status via logs is common for almost computer system, and detecting
anomalies in logs is crucial for timely identifying malfunctions of systems. However …

Automatic root cause analysis via large language models for cloud incidents

Y Chen, H **e, M Ma, Y Kang, X Gao, L Shi… - Proceedings of the …, 2024 - dl.acm.org
Ensuring the reliability and availability of cloud services necessitates efficient root cause
analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual …

[PDF][PDF] Anomaly Detection in the Open World: Normality Shift Detection, Explanation, and Adaptation.

D Han, Z Wang, W Chen, K Wang, R Yu, S Wang… - NDSS, 2023 - ndss-symposium.org
Concept drift is one of the most frustrating challenges for learning-based security
applications built on the closeworld assumption of identical distribution between training and …

Imdiffusion: Imputed diffusion models for multivariate time series anomaly detection

Y Chen, C Zhang, M Ma, Y Liu, R Ding, B Li… - arxiv preprint arxiv …, 2023 - arxiv.org
Anomaly detection in multivariate time series data is of paramount importance for ensuring
the efficient operation of large-scale systems across diverse domains. However, accurately …

Assess and summarize: Improve outage understanding with large language models

P **, S Zhang, M Ma, H Li, Y Kang, L Li, Y Liu… - Proceedings of the 31st …, 2023 - dl.acm.org
Cloud systems have become increasingly popular in recent years due to their flexibility and
scalability. Each time cloud computing applications and services hosted on the cloud are …

Robust multimodal failure detection for microservice systems

C Zhao, M Ma, Z Zhong, S Zhang, Z Tan… - Proceedings of the 29th …, 2023 - dl.acm.org
Proactive failure detection of instances is vitally essential to microservice systems because
an instance failure can propagate to the whole system and degrade the system's …

Diagnosing root causes of intermittent slow queries in cloud databases

M Ma, Z Yin, S Zhang, S Wang, C Zheng… - Proceedings of the …, 2020 - dl.acm.org
With the growing market of cloud databases, careful detection and elimination of slow
queries are of great importance to service stability. Previous studies focus on optimizing the …

Groot: An event-graph-based approach for root cause analysis in industrial settings

H Wang, Z Wu, H Jiang, Y Huang… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of
incidents to maintain high system availability. The recent development of microservice …

{Jump-Starting} multivariate time series anomaly detection for online service systems

M Ma, S Zhang, J Chen, J Xu, H Li, Y Lin… - 2021 USENIX Annual …, 2021 - usenix.org
With the booming of online service systems, anomaly detection on multivariate time series,
such as a combination of CPU utilization, average response time, and requests per second …

Identifying bad software changes via multimodal anomaly detection for online service systems

N Zhao, J Chen, Z Yu, H Wang, J Li, B Qiu… - Proceedings of the 29th …, 2021 - dl.acm.org
In large-scale online service systems, software changes are inevitable and frequent. Due to
importing new code or configurations, changes are likely to incur incidents and destroy user …