A survey of software log instrumentation

B Chen, ZM Jiang - ACM Computing Surveys (CSUR), 2021 - dl.acm.org
Log messages have been used widely in many software systems for a variety of purposes
during software development and field operation. There are two phases in software logging …

Pivot tracing: Dynamic causal monitoring for distributed systems

J Mace, R Roelke, R Fonseca - ACM Transactions on Computer Systems …, 2018 - dl.acm.org
Monitoring and troubleshooting distributed systems is notoriously difficult; potential problems
are complex, varied, and unpredictable. The monitoring and diagnosis tools commonly used …

Detecting anomalies in microservices with execution trace comparison

L Meng, F Ji, Y Sun, T Wang - Future Generation Computer Systems, 2021 - Elsevier
More and more developers and companies have adopted the concept of microservice.
Detecting anomalies and locating root causes are important for improving the reliability of …

Loghub: A large collection of system log datasets for ai-driven log analytics

J Zhu, S He, P He, J Liu, MR Lyu - 2023 IEEE 34th International …, 2023 - ieeexplore.ieee.org
Logs have been widely adopted in software system development and maintenance because
of the rich runtime information they record. In recent years, the increase of software size and …

An effective integrated method for learning big imbalanced data

M Ghanavati, RK Wong, F Chen… - … Congress on Big …, 2014 - ieeexplore.ieee.org
The imbalance of data has great effects on the performance of learning algorithms due to the
presence of under-represented data and severe class distribution skews. This is one of the …

Online reconstruction of structural information from datacenter logs

Z Chothia, J Liagouris, D Dimitrova… - Proceedings of the Twelfth …, 2017 - dl.acm.org
Well-run datacenter application architectures are heavily instrumented to provide detailed
traces of messages and remote invocations. Reconstructing user sessions, call graphs …

MicroCM: A cloud monitoring architecture for microservice invocation

R Wang, G Tian, S Ying - Computer Networks, 2024 - Elsevier
In the complex operating environment, it is difficult to find the root of the issue when there are
some time-consuming or error performance issues when executing the request. Aiming at …

Intent-Driven Multi-Engine Observability Dataflows for Heterogeneous Geo-Distributed Clouds

A Chakraborty, A Eswaran, P Thorat… - 2024 IEEE 17th …, 2024 - ieeexplore.ieee.org
With the growth of multi-cloud computing across a heterogeneous substrate of public cloud,
edge, and on-premise sites, observability has been gaining importance in compre-hending …

A framework for on-line timing error detection in software systems

M Cinque, D Cotroneo, R Della Corte… - Future Generation …, 2019 - Elsevier
On-line timing error detection entails gathering and analyzing monitoring data to pinpoint
deviations from the expected timing behavior of a given software system. Current solutions …

A runtime verification based trace-oriented monitoring framework for cloud systems

J Zhou, Z Chen, J Wang, Z Zheng… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
Cloud computing provides a new paradigm for resource utilization and sharing. However,
the reliability problems, like system failures, often happen in cloud systems and bring …