Automatic root cause analysis via large language models for cloud incidents

Y Chen, H **e, M Ma, Y Kang, X Gao, L Shi… - Proceedings of the …, 2024 - dl.acm.org
Ensuring the reliability and availability of cloud services necessitates efficient root cause
analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual …

Monitorassistant: Simplifying cloud service monitoring via large language models

Z Yu, M Ma, C Zhang, S Qin, Y Kang, C Bansal… - … Proceedings of the …, 2024 - dl.acm.org
In large-scale cloud service systems, monitoring metric data and conducting anomaly
detection is an important way to maintain reliability and stability. However, great disparity …

Can We Trust Auto-Mitigation? Improving Cloud Failure Prediction with Uncertain Positive Learning

H Li, M Ma, Y Liu, P Zhao, S Li, Z Li… - 2024 IEEE 35th …, 2024 - ieeexplore.ieee.org
In the rapidly expanding domain of cloud computing, a variety of software services have
been deployed in the cloud. To ensure the reliability of cloud services, prior studies focus on …

Early Bird: Ensuring Reliability of Cloud Systems Through Early Failure Prediction

Y Liu, M Ma, P Zhao, T Li, B Qiao, S Li… - 2024 IEEE 35th …, 2024 - ieeexplore.ieee.org
As cloud service continues to dominate various sectors, the reliability of cloud infrastructures
becomes crucial. Traditional methods of failure prediction often fall short in providing …