Understanding error propagation in deep learning neural network (DNN) accelerators and applications
Deep learning neural networks (DNNs) have been successful in solving a wide range of
machine learning problems. Specialized hardware accelerators have been proposed to …
machine learning problems. Specialized hardware accelerators have been proposed to …
SASSIFI: An architecture-level fault injection tool for GPU application resilience evaluation
As GPUs become more pervasive in both scalable high-performance computing systems
and safety-critical embedded systems, evaluating and analyzing their resilience to soft errors …
and safety-critical embedded systems, evaluating and analyzing their resilience to soft errors …
Demystifying the system vulnerability stack: Transient fault effects across the layers
In this paper, we revisit the system vulnerability stack for transient faults. We reveal severe
pitfalls in widely used vulnerability measurement approaches, which separate the hardware …
pitfalls in widely used vulnerability measurement approaches, which separate the hardware …
Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions
P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …
engineering and is today used to classify or detect objects, a key feature in autonomous …
BinFI an efficient fault injector for safety-critical machine learning systems
As machine learning (ML) becomes pervasive in high performance computing, ML has
found its way into safety-critical domains (eg, autonomous vehicles). Thus the reliability of …
found its way into safety-critical domains (eg, autonomous vehicles). Thus the reliability of …
A low-cost fault corrector for deep neural networks through range restriction
The adoption of deep neural networks (DNNs) in safety-critical domains has engendered
serious reliability concerns. A prominent example is hardware transient faults that are …
serious reliability concerns. A prominent example is hardware transient faults that are …
Modeling soft-error propagation in programs
As technology scales to lower feature sizes, devices become more susceptible to soft errors.
Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability …
Soft errors can lead to silent data corruptions (SDCs), seriously compromising the reliability …
Tensorfi: A flexible fault injection framework for tensorflow applications
As machine learning (ML) has seen increasing adoption in safety-critical domains (eg,
autonomous vehicles), the reliability of ML systems has also grown in importance. While …
autonomous vehicles), the reliability of ML systems has also grown in importance. While …
Avgi: Microarchitecture-driven, fast and accurate vulnerability assessment
We propose AVGI, a new Statistical Fault Injection (SFI)-based methodology, which delivers
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …
Llfi: An intermediate code-level fault injection tool for hardware faults
Q Lu, M Farahani, J Wei, A Thomas… - … on Software Quality …, 2015 - ieeexplore.ieee.org
Hardware errors are becoming more prominent with reducing feature sizes, however
tolerating them exclusively in hardware is expensive. Researchers have explored software …
tolerating them exclusively in hardware is expensive. Researchers have explored software …