Demystifying the system vulnerability stack: Transient fault effects across the layers
In this paper, we revisit the system vulnerability stack for transient faults. We reveal severe
pitfalls in widely used vulnerability measurement approaches, which separate the hardware …
pitfalls in widely used vulnerability measurement approaches, which separate the hardware …
Artificial neural networks for space and safety-critical applications: Reliability issues and potential solutions
P Rech - IEEE Transactions on Nuclear Science, 2024 - ieeexplore.ieee.org
Machine learning is among the greatest advancements in computer science and
engineering and is today used to classify or detect objects, a key feature in autonomous …
engineering and is today used to classify or detect objects, a key feature in autonomous …
Understanding and mitigating hardware failures in deep learning training systems
Y He, M Hutton, S Chan, R De Gruijl… - Proceedings of the 50th …, 2023 - dl.acm.org
Deep neural network (DNN) training workloads are increasingly susceptible to hardware
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …
failures in datacenters. For example, Google experienced" mysterious, difficult to identify …
Avgi: Microarchitecture-driven, fast and accurate vulnerability assessment
We propose AVGI, a new Statistical Fault Injection (SFI)-based methodology, which delivers
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …
orders of magnitude faster assessment of the Architectural Vulnerability Factor (AVF) of a …
Survey on Redundancy Based-Fault tolerance methods for Processors and Hardware accelerators-Trends in Quantum Computing, Heterogeneous Systems and …
S Venkatesha, R Parthasarathi - ACM Computing Surveys, 2024 - dl.acm.org
Rapid progress in the CMOS technology for the past 25 years has increased the
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
vulnerability of processors towards faults. Subsequently, focus of computer architects shifted …
Soft error effects on arm microprocessors: Early estimations versus chip measurements
PR Bodmann, G Papadimitriou… - IEEE Transactions …, 2021 - ieeexplore.ieee.org
Extensive research efforts are being carried out to evaluate and improve the reliability of
computing devices either through beam experiments or simulation-based fault injection …
computing devices either through beam experiments or simulation-based fault injection …
Impact of voltage scaling on soft errors susceptibility of multicore server cpus
Microprocessor power consumption and dependability are both crucial challenges that
designers have to cope with due to shrinking feature sizes and increasing transistor counts …
designers have to cope with due to shrinking feature sizes and increasing transistor counts …
Harpocrates: Breaking the silence of cpu faults through hardware-in-the-loop program generation
N Karystinos, O Chatzopoulos… - 2024 ACM/IEEE 51st …, 2024 - ieeexplore.ieee.org
Several hyperscalers have recently disclosed the occurrence of Silent Data Corruptions
(SDCs) in their systems fleets, sparking concerns about the severity of known and the …
(SDCs) in their systems fleets, sparking concerns about the severity of known and the …
Revealing gpus vulnerabilities by combining register-transfer and software-level fault injection
The complexity of both hardware and software makes GPUs reliability evaluation extremely
challenging. A low level fault injection on a GPU model, despite being accurate, would take …
challenging. A low level fault injection on a GPU model, despite being accurate, would take …
Silent data errors: Sources, detection, and modeling
A Singh, S Chakravarty, G Papadimitriou… - 2023 IEEE 41st VLSI …, 2023 - ieeexplore.ieee.org
Chip manufacturers and hyperscalers are becoming increasingly aware of the problem
posed by Silent Data Errors (SDE) and are taking steps to address it. Major computing …
posed by Silent Data Errors (SDE) and are taking steps to address it. Major computing …