A survey on deep learning for software engineering

Y Yang, X **a, D Lo, J Grundy - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
In 2006, Geoffrey Hinton proposed the concept of training “Deep Neural Networks (DNNs)”
and an improved model training method to break the bottleneck of neural network …

Semi-supervised log-based anomaly detection via probabilistic label estimation

L Yang, J Chen, Z Wang, W Wang… - 2021 IEEE/ACM …, 2021 - ieeexplore.ieee.org
With the growth of software systems, logs have become an important data to aid system
maintenance. Log-based anomaly detection is one of the most important methods for such …

Recommending root-cause and mitigation steps for cloud incidents using large language models

T Ahmed, S Ghosh, C Bansal… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
Incident management for cloud services is a complex process involving several steps and
has a huge impact on both service health and developer productivity. On-call engineers …

Deep learning library testing via effective model generation

Z Wang, M Yan, J Chen, S Liu, D Zhang - … of the 28th ACM Joint Meeting …, 2020 - dl.acm.org
Deep learning (DL) techniques are rapidly developed and have been widely adopted in
practice. However, similar to traditional software systems, DL systems also contain bugs …

Prioritizing test inputs for deep neural networks via mutation analysis

Z Wang, H You, J Chen, Y Zhang… - 2021 IEEE/ACM 43rd …, 2021 - ieeexplore.ieee.org
Deep Neural Network (DNN) testing is one of the most widely-used ways to guarantee the
quality of DNNs. However, labeling test inputs to check the correctness of DNN prediction is …

Automated root causing of cloud incidents using in-context learning with GPT-4

X Zhang, S Ghosh, C Bansal, R Wang, M Ma… - … Proceedings of the …, 2024 - dl.acm.org
Root Cause Analysis (RCA) plays a pivotal role in the incident diagnosis process for cloud
services, requiring on-call engineers to identify the primary issues and implement corrective …

Towards intelligent incident management: why we need it and how we make it

Z Chen, Y Kang, L Li, X Zhang, H Zhang, H Xu… - Proceedings of the 28th …, 2020 - dl.acm.org
The management of cloud service incidents (unplanned interruptions or outages of a
service/product) greatly affects customer satisfaction and business revenue. After years of …

Muffin: Testing deep learning libraries via neural architecture fuzzing

J Gu, X Luo, Y Zhou, X Wang - … of the 44th International Conference on …, 2022 - dl.acm.org
Deep learning (DL) techniques are proven effective in many challenging tasks, and become
widely-adopted in practice. However, previous work has shown that DL libraries, the basis of …

Monitorassistant: Simplifying cloud service monitoring via large language models

Z Yu, M Ma, C Zhang, S Qin, Y Kang, C Bansal… - … Proceedings of the …, 2024 - dl.acm.org
In large-scale cloud service systems, monitoring metric data and conducting anomaly
detection is an important way to maintain reliability and stability. However, great disparity …

Assess and summarize: Improve outage understanding with large language models

P **, S Zhang, M Ma, H Li, Y Kang, L Li, Y Liu… - Proceedings of the 31st …, 2023 - dl.acm.org
Cloud systems have become increasingly popular in recent years due to their flexibility and
scalability. Each time cloud computing applications and services hosted on the cloud are …