A survey on automated log analysis for reliability engineering

S He, P He, Z Chen, T Yang, Y Su, MR Lyu - ACM computing surveys …, 2021 - dl.acm.org
Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …

A systematic literature review on automated log abstraction techniques

D El-Masri, F Petrillo, YG Guéhéneuc… - Information and …, 2020 - Elsevier
Context: Logs are often the first and only information available to software engineers to
understand and debug their systems. Automated log-analysis techniques help software …

Deeplog: Anomaly detection and diagnosis from system logs through deep learning

M Du, F Li, G Zheng, V Srikumar - … of the 2017 ACM SIGSAC conference …, 2017 - dl.acm.org
Anomaly detection is a critical step towards building a secure and trustworthy system. The
primary purpose of a system log is to record system states and significant events at various …

Spell: Streaming parsing of system event logs

M Du, F Li - 2016 IEEE 16th International Conference on Data …, 2016 - ieeexplore.ieee.org
System event logs have been frequently used as a valuable resource in data-driven
approaches to enhance system health and stability. A typical procedure in system log …

A survey of aiops methods for failure management

P Notaro, J Cardoso, M Gerndt - ACM Transactions on Intelligent …, 2021 - dl.acm.org
Modern society is increasingly moving toward complex and distributed computing systems.
The increase in scale and complexity of these systems challenges O&M teams that perform …

Pivot tracing: Dynamic causal monitoring for distributed systems

J Mace, R Roelke, R Fonseca - ACM Transactions on Computer Systems …, 2018 - dl.acm.org
Monitoring and troubleshooting distributed systems is notoriously difficult; potential problems
are complex, varied, and unpredictable. The monitoring and diagnosis tools commonly used …

Identifying impactful service system problems via log analysis

S He, Q Lin, JG Lou, H Zhang, MR Lyu… - Proceedings of the 2018 …, 2018 - dl.acm.org
Logs are often used for troubleshooting in large-scale software systems. For a cloud-based
online system that provides 24/7 service, a huge number of logs could be generated every …

Canopy: An end-to-end performance tracing and analysis system

J Kaldor, J Mace, M Bejda, E Gao… - Proceedings of the 26th …, 2017 - dl.acm.org
This paper presents Canopy, Facebook's end-to-end performance tracing infrastructure.
Canopy records causally related performance data across the end-to-end execution path of …

Cloudseer: Workflow monitoring of cloud infrastructures via interleaved logs

X Yu, P Joshi, J Xu, G **, H Zhang… - ACM SIGARCH Computer …, 2016 - dl.acm.org
Cloud infrastructures provide a rich set of management tasks that operate computing,
storage, and networking resources in the cloud. Monitoring the executions of these tasks is …

General LTL specification mining (T)

C Lemieux, D Park… - 2015 30th IEEE/ACM …, 2015 - ieeexplore.ieee.org
Temporal properties are useful for describing and reasoning about software behavior, but
developers rarely write down temporal specifications of their systems. Prior work on inferring …