- Academic Search

Q Yu, N Zhao, M Li, Z Li, H Wang, W Zhang… - Journal of Network and …, 2024 - Elsevier

Modern service systems are constantly improving with the development of various IT
technologies, leading to a boost in system scales and complex dependencies among …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Assess and summarize: Improve outage understanding with large language models

P **, S Zhang, M Ma, H Li, Y Kang, L Li, Y Liu… - Proceedings of the 31st …, 2023 - dl.acm.org

Cloud systems have become increasingly popular in recent years due to their flexibility and
scalability. Each time cloud computing applications and services hosted on the cloud are …

บันทึก อ้างอิง อ้างโดย39 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Knowledge-aware alert aggregation in large-scale cloud systems: a hybrid approach

J Kuang, J Liu, J Huang, R Zhong, J Gu, L Yu… - Proceedings of the 46th …, 2024 - dl.acm.org

Due to the scale and complexity of cloud systems, a system failure would trigger an" alert
storm", ie, massive correlated alerts. Although these alerts can be traced back to a few root …

บันทึก อ้างอิง อ้างโดย6 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators

J Chen, C Jia, Y Yan, J Ge, H Zheng… - Proceedings of the ACM …, 2024 - dl.acm.org

Deep learning (DL) is a critical tool for real-world applications, and comprehensive testing of
DL models is vital to ensure their quality before deployment. However, recent studies have …

บันทึก อ้างอิง อ้างโดย3 บทความที่เกี่ยวข้อง

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Faultprofit: Hierarchical fault profiling of incident tickets in large-scale cloud systems

J Huang, J Liu, Z Chen, Z Jiang, Y Li, J Gu… - Proceedings of the 46th …, 2024 - dl.acm.org

Postmortem analysis is essential in the management of incidents within cloud systems,
which provides valuable insights to improve system's reliability and robustness. At CloudA1 …

บันทึก อ้างอิง อ้างโดย4 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

Graph based incident extraction and diagnosis in large-scale online systems

Z He, P Chen, Y Luo, Q Yan, H Chen, G Yu… - Proceedings of the 37th …, 2022 - dl.acm.org

With the ever increasing scale and complexity of online systems, incidents are gradually
becoming commonplace. Without appropriate handling, they can seriously harm the system …

บันทึก อ้างอิง อ้างโดย15 บทความที่เกี่ยวข้อง

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Tracemesh: Scalable and streaming sampling for distributed traces

Z Chen, Z Jiang, Y Su, MR Lyu… - 2024 IEEE 17th …, 2024 - ieeexplore.ieee.org

Distributed tracing serves as a fundamental element in the monitoring of cloud-based and
datacenter systems. It provides visibility into the full life cycle of a request or operation across …

บันทึก อ้างอิง อ้างโดย5 บทความที่เกี่ยวข้อง ทั้งหมด 7 ฉบับ

Heterogeneous data-driven failure diagnosis for microservice-based industrial clouds toward consumer digital ecosystems

Y Xu, Z Qiu, H Gao, X Zhao, L Wang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Consumer digital ecosystems include a large volume of different types of applications, and
those applications are usually deployed in industrial cloud computing systems. Currently …

บันทึก อ้างอิง อ้างโดย7 บทความที่เกี่ยวข้อง ทั้งหมด 4 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Incident-aware duplicate ticket aggregation for cloud systems

J Liu, S He, Z Chen, L Li, Y Kang… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org

In cloud systems, incidents are potential threats to customer satisfaction and business
revenue. When customers are affected by incidents, they often request customer support …

บันทึก อ้างอิง อ้างโดย10 บทความที่เกี่ยวข้อง ทั้งหมด 5 ฉบับ

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Prism: Revealing hidden functional clusters from massive instances in cloud systems

J Liu, Z Jiang, J Gu, J Huang, Z Chen… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org

Ensuring the reliability of cloud systems is critical for both cloud vendors and customers.
Cloud systems often rely on virtualization techniques to create instances of hardware …

บันทึก อ้างอิง อ้างโดย5 บทความที่เกี่ยวข้อง ทั้งหมด 8 ฉบับ

สร้างการแจ้งเตือน

อ้างอิง

การค้นหาขั้นสูง

บันทึกไปยังคลังของฉันแล้ว

Graph-based incident aggregation for large-scale online service systems

A survey on intelligent management of alerts and incidents in IT services

Assess and summarize: Improve outage understanding with large language models

Knowledge-aware alert aggregation in large-scale cloud systems: a hybrid approach

A Miss Is as Good as A Mile: Metamorphic Testing for Deep Learning Operators

Faultprofit: Hierarchical fault profiling of incident tickets in large-scale cloud systems

Graph based incident extraction and diagnosis in large-scale online systems

Tracemesh: Scalable and streaming sampling for distributed traces

Heterogeneous data-driven failure diagnosis for microservice-based industrial clouds toward consumer digital ecosystems

Incident-aware duplicate ticket aggregation for cloud systems

Prism: Revealing hidden functional clusters from massive instances in cloud systems