A systematic literature review on hardware reliability assessment methods for deep neural networks

MH Ahmadilivani, M Taheri, J Raik… - ACM Computing …, 2024 - dl.acm.org
Artificial Intelligence (AI) and, in particular, Machine Learning (ML), have emerged to be
utilized in various applications due to their capability to learn how to solve complex …

Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions

P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …

High energy and thermal neutron sensitivity of google tensor processing units

RLR Junior, S Malde, C Cazzaniga… - … on Nuclear Science, 2022 - ieeexplore.ieee.org
In this article, we investigate the reliability of Google's coral tensor processing units (TPUs)
to both high-energy atmospheric neutrons (at ChipIR) and thermal neutrons from a pulsed …

Exploring hardware fault impacts on different real number representations of the structural resilience of tcus in gpus

R Limas Sierra, JD Guerrero-Balaguera, JER Condia… - Electronics, 2024 - mdpi.com
The most recent generations of graphics processing units (GPUs) boost the execution of
convolutional operations required by machine learning applications by resorting to …

Reliability exploration of system-on-chip with multi-bit-width accelerator for multi-precision deep neural networks

Q Cheng, M Huang, C Man, A Shen… - … on Circuits and …, 2023 - ieeexplore.ieee.org
Deep neural networks (DNNs) in safety-critical applications demand high reliability even
when running on edge-computing devices. Recent works on System-on-Chip (SoC) design …

Numerical behavior of NVIDIA tensor cores

M Fasi, NJ Higham, M Mikaitis, S Pranesh - PeerJ Computer Science, 2021 - peerj.com
We explore the floating-point arithmetic implemented in the NVIDIA tensor cores, which are
hardware accelerators for mixed-precision matrix multiplication available on the Volta …

Deepvigor: Vulnerability value ranges and factors for dnns' reliability assessment

MH Ahmadilivani, M Taheri, J Raik… - 2023 IEEE European …, 2023 - ieeexplore.ieee.org
Deep Neural Networks (DNNs) and their accelerators are being deployed ever more
frequently in safety-critical applications leading to increasing reliability concerns. A …

Exploration of activation fault reliability in quantized systolic array-based dnn accelerators

M Taheri, N Cherezova, MS Ansari… - … on Quality Electronic …, 2024 - ieeexplore.ieee.org
The stringent requirements for the Deep Neural Networks (DNNs) accelerator's reliability
stand along with the need for reducing the computational burden on the hardware platforms …

Analyzing the impact of different real number formats on the structural reliability of tcus in gpus

RL Sierra, JD Guerrero-Balaguera… - 2023 IFIP/IEEE 31st …, 2023 - ieeexplore.ieee.org
1 Modern Graphics Processing Units (GPUs) boost the execution of tiled matrix
multiplications by extensively using in-chip accelerators (Tensor Core Units or TCUs) …

On the rise of amd matrix cores: Performance, power efficiency, and programmability

G Schieffer, DA De Medeiros, J Faj… - … Analysis of Systems …, 2024 - ieeexplore.ieee.org
Matrix multiplication is a core computational part of deep learning and scientific workloads.
The emergence of Matrix Cores in high-end AMD GPUs, a building block of Exascale …