Cloud-native computing: A survey from the perspective of services
The development of cloud computing delivery models inspires the emergence of cloud-
native computing. Cloud-native computing, as the most influential development principle for …
native computing. Cloud-native computing, as the most influential development principle for …
Xpert: Empowering incident management with query recommendations via large language models
Large-scale cloud systems play a pivotal role in modern IT infrastructure. However, incidents
occurring within these systems can lead to service disruptions and adversely affect user …
occurring within these systems can lead to service disruptions and adversely affect user …
Assess and summarize: Improve outage understanding with large language models
Cloud systems have become increasingly popular in recent years due to their flexibility and
scalability. Each time cloud computing applications and services hosted on the cloud are …
scalability. Each time cloud computing applications and services hosted on the cloud are …
How to fight production incidents? an empirical study on a large-scale cloud service
Production incidents in today's large-scale cloud services can be extremely expensive in
terms of customer impacts and engineering resources required to mitigate them. Despite …
terms of customer impacts and engineering resources required to mitigate them. Despite …
A survey on intelligent management of alerts and incidents in IT services
Modern service systems are constantly improving with the development of various IT
technologies, leading to a boost in system scales and complex dependencies among …
technologies, leading to a boost in system scales and complex dependencies among …
Detection is better than cure: A cloud incidents perspective
Cloud providers use automated watchdogs or monitors to continuously observe service
availability and to proactively report incidents when system performance degrades. Improper …
availability and to proactively report incidents when system performance degrades. Improper …
Incident-aware duplicate ticket aggregation for cloud systems
In cloud systems, incidents are potential threats to customer satisfaction and business
revenue. When customers are affected by incidents, they often request customer support …
revenue. When customers are affected by incidents, they often request customer support …
An intelligent framework for timely, accurate, and comprehensive cloud incident detection
Cloud incidents (service interruptions or performance degradation) dramatically degrade the
reliability of large-scale cloud systems, causing customer dissatisfaction and revenue loss …
reliability of large-scale cloud systems, causing customer dissatisfaction and revenue loss …
Understanding and predicting incident mitigation time
Context: Incident management plays a significant role in online service systems. Incidents
should be mitigated as soon as possible in order to achieve high service stability. However …
should be mitigated as soon as possible in order to achieve high service stability. However …
Prism: Revealing hidden functional clusters from massive instances in cloud systems
Ensuring the reliability of cloud systems is critical for both cloud vendors and customers.
Cloud systems often rely on virtualization techniques to create instances of hardware …
Cloud systems often rely on virtualization techniques to create instances of hardware …