A systematic literature review on automated log abstraction techniques

D El-Masri, F Petrillo, YG Guéhéneuc… - Information and …, 2020 - Elsevier
Context: Logs are often the first and only information available to software engineers to
understand and debug their systems. Automated log-analysis techniques help software …

Log clustering based problem identification for online service systems

Q Lin, H Zhang, JG Lou, Y Zhang, X Chen - Proceedings of the 38th …, 2016 - dl.acm.org
Logs play an important role in the maintenance of large-scale online service systems. When
an online service fails, engineers need to examine recorded logs to gain insights into the …

Logram: Efficient Log Parsing Using -Gram Dictionaries

H Dai, H Li, CS Chen, W Shang… - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Software systems usually record important runtime information in their logs. Logs help
practitioners understand system runtime behaviors and diagnose field failures. As logs are …

Leveraging existing instrumentation to automatically infer invariant-constrained models

I Beschastnikh, Y Brun, S Schneider, M Sloan… - Proceedings of the 19th …, 2011 - dl.acm.org
Computer systems are often difficult to debug and understand. A common way of gaining
insight into system behavior is to inspect execution logs and documentation. Unfortunately …

An improved KNN-based efficient log anomaly detection method with automatically labeled samples

S Ying, B Wang, L Wang, Q Li, Y Zhao… - ACM Transactions on …, 2021 - dl.acm.org
Logs that record system abnormal states (anomaly logs) can be regarded as outliers, and
the k-Nearest Neighbor (kNN) algorithm has relatively high accuracy in outlier detection …

Debugging distributed systems

I Beschastnikh, P Wang, Y Brun, MD Ernst - Communications of the ACM, 2016 - dl.acm.org
Debugging distributed systems Page 1 32 COMMUNICATIONS OF THE ACM | AUGUST 2016 |
VOL. 59 | NO. 8 practice DOI:10.1145/2909480 Article development led by queue.acm.org …

Visualizing distributed system executions

I Beschastnikh, P Liu, A **ng, P Wang, Y Brun… - ACM Transactions on …, 2020 - dl.acm.org
Distributed systems pose unique challenges for software developers. Understanding the
system's communication topology and reasoning about concurrent activities of system hosts …

Online anomaly detection in hpc systems

A Borghesi, A Libri, L Benini… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Reliability is a cumbersome problem in High Performance Computing Systems and Data
Centers evolution. During operation, several types of fault conditions or anomalies can arise …

Behavioral resource-aware model inference

T Ohmann, M Herzberg, S Fiss, A Halbert… - Proceedings of the 29th …, 2014 - dl.acm.org
Software bugs often arise because of differences between what developers think their
system does and what the system actually does. These differences frustrate debugging and …

Using declarative specification to improve the understanding, extensibility, and comparison of model-inference algorithms

I Beschastnikh, Y Brun, J Abrahamson… - IEEE Transactions …, 2014 - ieeexplore.ieee.org
It is a staple development practice to log system behavior. Numerous powerful model-
inference algorithms have been proposed to aid developers in log analysis and system …