Demystifying and Extracting Fault-indicating Information from Logs for Failure Diagnosis

J Huang, Z Jiang, J Liu, Y Huo, J Gu… - 2024 IEEE 35th …, 2024 - ieeexplore.ieee.org
Logs are imperative in the maintenance of online service systems, which often encompass
important information for effective failure mitigation. While existing anomaly detection …

The Current Challenges of Software Engineering in the Era of Large Language Models

C Gao, X Hu, S Gao, X **a, Z ** - ACM Transactions on Software …, 2024 - dl.acm.org
With the advent of large language models (LLMs) in the artificial intelligence (AI) area, the
field of software engineering (SE) has also witnessed a paradigm shift. These models, by …

AI Assistants for Incident Lifecycle in a Microservice Environment: A Systematic Literature Review

DZ Zhou, M Fokaefs - arxiv preprint arxiv:2410.04334, 2024 - arxiv.org
Incidents in microservice environments can be costly and challenging to recover from due to
their complexity and distributed nature. Recent advancements in artificial intelligence (AI) …

Multi-source KPIs' root cause localization in online service systems

H **a, J Xu, B **ao, H Jia, C Gao… - … on Networking and …, 2024 - ieeexplore.ieee.org
Root cause localization is challenging because of the large number of monitoring metrics
and the many types of faults in an online service system extended by a microservices …