A survey on automated log analysis for reliability engineering

S He, P He, Z Chen, T Yang, Y Su, MR Lyu - ACM computing surveys …, 2021 - dl.acm.org
Logs are semi-structured text generated by logging statements in software source code. In
recent decades, software logs have become imperative in the reliability assurance …

Survey on models and techniques for root-cause analysis

M Solé, V Muntés-Mulero, AI Rana… - arxiv preprint arxiv …, 2017 - arxiv.org
Automation and computer intelligence to support complex human decisions becomes
essential to manage large and distributed systems in the Cloud and IoT era. Understanding …

The impact of integrating information technology with operational technology in physical assets: a literature review

A Kok, A Martinetti, J Braaksma - IEEE Access, 2024 - ieeexplore.ieee.org
The convergence of information technology (IT) with operational technology (OT), within
physical assets can enhance performance but also presents challenges due to higher …

Подводные робототехнические комплексы: системы, технологии, применение

АВ Инзарцев, ЛВ Киселев, ВВ Костенко… - 2018 - elibrary.ru
Монография посвящена проблемам создания и практического использования
подводных робототехнических комплексов, включающих автономные и …

Experience report: Anomaly detection of cloud application operations using log and cloud metric correlation analysis

M Farshchi, JG Schneider, I Weber… - 2015 IEEE 26th …, 2015 - ieeexplore.ieee.org
Failure of application operations is one of the main causes of system-wide outages in cloud
environments. This particularly applies to DevOps operations, such as backup …

Metric selection and anomaly detection for cloud operations using log and metric correlation analysis

M Farshchi, JG Schneider, I Weber, J Grundy - Journal of Systems and …, 2018 - Elsevier
Cloud computing systems provide the facilities to make application services resilient against
failures of individual computing resources. However, resiliency is typically limited by a cloud …

Identifying linked incidents in large-scale online service systems

Y Chen, X Yang, H Dong, X He, H Zhang… - Proceedings of the 28th …, 2020 - dl.acm.org
In large-scale online service systems, incidents occur frequently due to a variety of causes,
from updates of software and hardware to changes in operation environment. These …

Multi-agent Systems: A survey about its components, framework and workflow

D Maldonado, E Cruz, JA Torres, PJ Cruz… - IEEE …, 2024 - ieeexplore.ieee.org
With the rapid technological advancements and the ever-evolving complex systems, the
identification and integration of the components and resources for the functioning of multi …

POD-Diagnosis: Error diagnosis of sporadic operations on cloud applications

X Xu, L Zhu, I Weber, L Bass… - 2014 44th annual ieee/ifip …, 2014 - ieeexplore.ieee.org
Applications in the cloud are subject to sporadic changes due to operational activities such
as upgrade, redeployment, and on-demand scaling. These operations are also subject to …

A neuro-symbolic approach for anomaly detection and complex fault diagnosis exemplified in the automotive domain

T Bohne, AKP Windler, M Atzmueller - Proceedings of the 12th …, 2023 - dl.acm.org
This paper presents an iterative, hybrid neuro-symbolic approach for anomaly detection and
complex fault diagnosis, enabling knowledge-based (symbolic) methods to complement …