The case for lifetime reliability-aware microprocessors

J Srinivasan, SV Adve, P Bose, JA Rivers - ACM SIGARCH Computer …, 2004 - dl.acm.org
Ensuring long processor lifetimes by limiting failuresdue to wear-out related hard errors is a
critical requirementfor all microprocessor manufacturers. We observethat continuous device …

Relief: A reinforcement-learning-based real-time task assignment strategy in emerging fault-tolerant fog computing

R Siyadatzadeh, F Mehrafrooz, M Ansari… - IEEE Internet of …, 2023 - ieeexplore.ieee.org
Due to the real-time requirements in several IoT applications, fog computing has emerged to
overcome the long latency and other constraints of cloud computing. Due to the high …

An experimental study of reduced-voltage operation in modern FPGAs for neural network acceleration

B Salami, EB Onural, IE Yuksel, F Koc… - 2020 50th Annual …, 2020 - ieeexplore.ieee.org
We empirically evaluate an undervolting technique, ie, underscaling the circuit supply
voltage below the nominal level, to improve the power-efficiency of Convolutional Neural …

Comprehensive evaluation of supply voltage underscaling in fpga on-chip memories

B Salami, OS Unsal… - 2018 51st Annual IEEE …, 2018 - ieeexplore.ieee.org
In this work, we evaluate aggressive undervolting, ie, voltage scaling below the nominal
level to reduce the energy consumption of Field Programmable Gate Arrays (FPGAs) …

Ring-DVFS: Reliability-aware reinforcement learning-based DVFS for real-time embedded systems

A Yeganeh-Khaksar, M Ansari, S Safari… - IEEE Embedded …, 2020 - ieeexplore.ieee.org
Dynamic voltage and frequency scaling (DVFS) is one of the most popular and exploited
techniques to reduce power consumption in multicore embedded systems. However, this …

Application and thermal-reliability-aware reinforcement learning based multi-core power management

SMP Dinakarrao, A Joseph, A Haridass… - ACM Journal on …, 2019 - dl.acm.org
Power management through dynamic voltage and frequency scaling (DVFS) is one of the
most widely adopted techniques. However, it impacts application reliability (due to soft …

Understanding power consumption and reliability of high-bandwidth memory with voltage underscaling

SSN Larimi, B Salami, OS Unsal… - … , Automation & Test …, 2021 - ieeexplore.ieee.org
Modern computing devices employ High-Bandwidth Memory (HBM) to meet their memory
bandwidth requirements. An HBM-enabled device consists of multiple DRAM layers stacked …

Hamartia: A fast and accurate error injection framework

CK Chang, S Lym, N Kelly… - 2018 48th Annual …, 2018 - ieeexplore.ieee.org
Single bit-flip has been the most popular error model for resilience studies with fault
injection. We use RTL gate-level fault injection to show that this model fails to cover many …

Evaluating and accelerating high-fidelity error injection for hpc

CK Chang, S Lym, N Kelly… - … conference for high …, 2018 - ieeexplore.ieee.org
We address two important concerns for the analysis of the behavior of applications in the
presence of hardware errors:(1) when is it important to model how hardware faults lead to …

Asymmetric resilience: Exploiting task-level idempotency for transient error recovery in accelerator-based systems

J Leng, A Buyuktosunoglu, R Bertran… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Accelerators make the task of building systems that are re-silient against transient errors like
voltage noise and soft errors hard. Architects integrate accelerators into the system as black …