Survey on redundancy based-fault tolerance methods for processors and hardware accelerators-trends in quantum computing, heterogeneous systems and reliability

S Venkatesha, R Parthasarathi - ACM Computing Surveys, 2024 - dl.acm.org
Rapid progress in CMOS technology since the late 1990s has increased the vulnerability of
processors toward faults. Subsequently, the focus of computer architects has shifted toward …

A survey on multithreading alternatives for soft error fault tolerance

I Oz, S Arslan - ACM Computing Surveys (CSUR), 2019 - dl.acm.org
Smaller transistor sizes and reduction in voltage levels in modern microprocessors induce
higher soft error rates. This trend makes reliability a primary design constraint for computer …

[PDF][PDF] Hardware error detection using AN-codes

U Schiffel - 2011 - academia.edu
Due to the continuously decreasing feature sizes and the increasing complexity of integrated
circuits, commercial off-the-shelf (COTS) hardware is becoming less and less reliable …

Exploiting idle hardware to provide low overhead fault tolerance for vliw processors

AL Sartor, AF Lorenzon, L Carro… - ACM Journal on …, 2017 - dl.acm.org
Because of technology scaling, the soft error rate has been increasing in digital circuits,
which affects system reliability. Therefore, modern processors, including VLIW architectures …

A survey on post-silicon functional validation for multicore architectures

P Jayaraman, R Parthasarathi - ACM Computing Surveys (CSUR), 2017 - dl.acm.org
During a processor development cycle, post-silicon validation is performed on the first
fabricated chip to detect and fix design errors. Design errors occur due to functional issues …

A survey of checker architectures

R Kalayappan, SR Sarangi - ACM Computing Surveys (CSUR), 2013 - dl.acm.org
Reliability is quickly becoming a primary design constraint for high-end processors because
of the inherent limits of manufacturability, extreme miniaturization of transistors, and the …

A Survey of fault models and fault tolerance methods for 2D bus-based multi-core systems and TSV based 3D NOC many-core systems

S Venkatesha, R Parthasarathi - arxiv preprint arxiv:2203.07830, 2022 - arxiv.org
Reliability has taken centre stage in the development of high-performance computing
processors. A Surge of interest is noticeable in recent times in formulating fault and failure …

32-Bit one instruction core: A low-cost, reliable, and fault-tolerant core for multicore systems

S Venkatesha, R Parthasarathi - Journal of Testing …, 2019 - asmedigitalcollection.asme.org
Occurrences of both transient and permanent errors pose a major challenge in the wake of
burgeoning growth in transistor density. Manufacturing defects and process variants lead to …

Design of a reliable cache system for heterogeneous CMPs

B Chakraborty, M Dalui, BK Sikdar - Journal of Circuits, Systems and …, 2018 - World Scientific
The embedded system-on-a-chip (SoC), that integrates heterogeneous processors with
variation in coherence protocol, adds complexity in maintaining coherency in the data …