Soft errors in DNN accelerators: A comprehensive review
Deep learning tasks cover a broad range of domains and an even more extensive range of
applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural …
applications, from entertainment to extremely safety-critical fields. Thus, Deep Neural …
Analyzing and increasing the reliability of convolutional neural networks on GPUs
Graphics processing units (GPUs) are playing a critical role in convolutional neural networks
(CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments …
(CNNs) for image detection. As GPU-enabled CNNs move into safety-critical environments …
Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions
P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …
engineering and is today used to classify or detect objects, a key feature in autonomous …
Understanding GPU errors on large-scale HPC systems and the implications for system design and operation
D Tiwari, S Gupta, J Rogers, D Maxwell… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Increase in graphics hardware performance and improvements in programmability has
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …
enabled GPUs to evolve from a graphics-specific accelerator to a general-purpose …
High energy and thermal neutron sensitivity of google tensor processing units
RLR Junior, S Malde, C Cazzaniga… - … on Nuclear Science, 2022 - ieeexplore.ieee.org
In this article, we investigate the reliability of Google's coral tensor processing units (TPUs)
to both high-energy atmospheric neutrons (at ChipIR) and thermal neutrons from a pulsed …
to both high-energy atmospheric neutrons (at ChipIR) and thermal neutrons from a pulsed …
Soft error resilience of deep residual networks for object recognition
Convolutional Neural Networks (CNNs) have truly gained attention in object recognition and
object classification in particular. When being implemented on Graphics Processing Units …
object classification in particular. When being implemented on Graphics Processing Units …
Evaluation and mitigation of radiation-induced soft errors in graphics processing units
Graphics processing units (GPUs) are increasingly attractive for both safety-critical and High-
Performance Computing applications. GPU reliability is a primary concern for both the …
Performance Computing applications. GPU reliability is a primary concern for both the …
Evaluation and mitigation of soft-errors in neural network-based object detection in three GPU architectures
FF dos Santos, L Draghetti, L Weigel… - 2017 47th Annual …, 2017 - ieeexplore.ieee.org
In this paper, we evaluate the reliability of the You Only Look Once (YOLO) object detection
framework. We have exposed to controlled neutron beams GPUs designed with three …
framework. We have exposed to controlled neutron beams GPUs designed with three …
Impact of GPUs parallelism management on safety-critical and HPC applications reliability
Graphics Processing Units (GPUs) offer high computational power but require high
scheduling strain to manage parallel processes, which increases the GPU cross section …
scheduling strain to manage parallel processes, which increases the GPU cross section …
Experimental and analytical study of xeon phi reliability
We present an in-depth analysis of transient faults effects on HPC applications in Intel Xeon
Phi processors based on radiation experiments and high-level fault injection. Besides …
Phi processors based on radiation experiments and high-level fault injection. Besides …