A systematic literature review on hardware reliability assessment methods for deep neural networks

MH Ahmadilivani, M Taheri, J Raik… - ACM Computing …, 2024 - dl.acm.org
Artificial Intelligence (AI) and, in particular, Machine Learning (ML), have emerged to be
utilized in various applications due to their capability to learn how to solve complex …

Druto: Upper-bounding silent data corruption vulnerability in gpu applications

MH Rahman, S Di, S Guo, X Lu, G Li… - 2024 IEEE …, 2024 - ieeexplore.ieee.org
Due to the increasing scale of high-performance computing (HPC) systems, transient
hardware faults have become a major reliability concern. Consequently, Silent Data …

Investigating the impact of transient hardware faults on deep learning neural network inference

MH Rahman, S Laskar, G Li - Software Testing, Verification and …, 2024 - Wiley Online Library
Safety‐critical applications, such as autonomous vehicles, healthcare, and space
applications, have witnessed widespread deployment of deep neural networks (DNNs) …

DeepVigor+: Scalable and Accurate Semi-Analytical Fault Resilience Analysis for Deep Neural Network

MH Ahmadilivani, J Raik, M Daneshtalab… - arxiv preprint arxiv …, 2024 - arxiv.org
Growing exploitation of Machine Learning (ML) in safety-critical applications necessitates
rigorous safety analysis. Hardware reliability assessment is a major concern with respect to …