LightLog: A lightweight temporal convolutional network for log anomaly detection on the edge

Z Wang, J Tian, H Fang, L Chen, J Qin - Computer Networks, 2022 - Elsevier
Log anomaly detection on edge devices is the key to enhance edge security when
deploying IoT systems. Despite the success of many newly proposed deep learning based …

Smart predictive maintenance for high-performance computing systems: a literature review

ALCD Lima, VM Aranha, CJL Carvalho… - The Journal of …, 2021 - Springer
Predictive maintenance is an invaluable tool to preserve the health of mission critical assets
while minimizing the operational costs of scheduled intervention. Artificial intelligence …

RUAD: Unsupervised anomaly detection in HPC systems

M Molan, A Borghesi, D Cesarini, L Benini… - Future Generation …, 2023 - Elsevier
The increasing complexity of modern high-performance computing (HPC) systems
necessitates the introduction of automated and data-driven methodologies to support system …

Anomaly detection and anticipation in high performance computing systems

A Borghesi, M Molan, M Milano… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
In their quest toward Exascale, High Performance Computing (HPC) systems are rapidly
becoming larger and more complex, together with the issues concerning their maintenance …

Prodigy: Towards unsupervised anomaly detection in production hpc systems

B Aksar, E Sencan, B Schwaller, O Aaziz… - Proceedings of the …, 2023 - dl.acm.org
Performance variations caused by anomalies in modern High Performance Computing
(HPC) systems lead to decreased efficiency, impaired application performance, and …

pAElla: Edge AI-Based Real-Time Malware Detection in Data Centers

A Libri, A Bartolini, L Benini - IEEE Internet of Things Journal, 2020 - ieeexplore.ieee.org
The increasing use of Internet-of-Things (IoT) devices for monitoring a wide spectrum of
applications, along with the challenges of “big data” streaming support they often require for …

Paving the way toward energy-aware and automated datacentre

A Bartolini, F Beneventi, A Borghesi… - … Proceedings of the …, 2019 - dl.acm.org
Energy efficiency and datacentre automation are critical targets of the research and
deployment agenda of CINECA and its research partners in the Energy Efficient System …

Online machine learning for accelerating molecular dynamics modeling of cells

Z Zhang, P Zhang, C Han, G Cong… - Frontiers in Molecular …, 2022 - frontiersin.org
We developed a biomechanics-informed online learning framework to learn the dynamics
with ground truth generated with multiscale modeling simulation. It was built on Summit-like …

Runtime Performance Anomaly Diagnosis in Production HPC Systems Using Active Learning

B Aksar, E Sencan, B Schwaller, O Aaziz… - … on Parallel and …, 2024 - ieeexplore.ieee.org
With the increasing scale and complexity of High-Performance Computing (HPC) systems,
performance variations in applications caused by anomalies have become significant …

Clairvoyant: a log-based transformer-decoder for failure prediction in large-scale systems

KA Alharthi, A Jhumka, S Di, F Cappello - Proceedings of the 36th ACM …, 2022 - dl.acm.org
System failures are expected to be frequent in the exascale era such as current Petascale
systems. The health of such systems is usually determined from challenging analysis of …