Survey on redundancy based-fault tolerance methods for processors and hardware accelerators-trends in quantum computing, heterogeneous systems and reliability

S Venkatesha, R Parthasarathi - ACM Computing Surveys, 2024 - dl.acm.org
Rapid progress in CMOS technology since the late 1990s has increased the vulnerability of
processors toward faults. Subsequently, the focus of computer architects has shifted toward …

Addressing failures in exascale computing

M Snir, RW Wisniewski, JA Abraham… - … Journal of High …, 2014 - journals.sagepub.com
We present here a report produced by a workshop on 'Addressing failures in exascale
computing'held in Park City, Utah, 4–11 August 2012. The charter of this workshop was to …

Intermittent computation without hardware support or programmer intervention

J Van Der Woude, M Hicks - 12th USENIX Symposium on Operating …, 2016 - usenix.org
As computation scales downward in area, the limitations imposed by the batteries required
to power that computation become more pronounced. Thus, many future devices will forgo …

ThyNVM: Enabling software-transparent crash consistency in persistent memory systems

J Ren, J Zhao, S Khan, J Choi, Y Wu… - Proceedings of the 48th …, 2015 - dl.acm.org
Emerging byte-addressable nonvolatile memories (NVMs) promise persistent memory,
which allows processors to directly access persistent data in main memory. Yet, persistent …

[КНИГА][B] Architecture design for soft errors

S Mukherjee - 2011 - books.google.com
Architecture Design for Soft Errors provides a comprehensive description of the architectural
techniques to tackle the soft error problem. It covers the new methodologies for quantitative …

Detailed design and evaluation of redundant multithreading alternatives

SS Mukherjee, M Kontz, SK Reinhardt - ACM SIGARCH Computer …, 2002 - dl.acm.org
Exponential growth in the number of on-chip transistors, coupled with reductions in voltage
levels, makes each generation of microprocessors increasingly vulnerable to transient faults …

Clank: Architectural support for intermittent computation

M Hicks - ACM SIGARCH Computer Architecture News, 2017 - dl.acm.org
The processors that drive embedded systems are getting smaller; meanwhile, the batteries
used to provide power to those systems have stagnated. If we are to realize the dream of …

A" flight data recorder" for enabling full-system multiprocessor deterministic replay

M Xu, R Bodik, MD Hill - Proceedings of the 30th annual international …, 2003 - dl.acm.org
Debuggers have been proven indispensable in improving software reliability. Unfortunately,
on most real-life software, debuggers fail to deliver their most essential feature---a faithful …

Variability mitigation in nanometer CMOS integrated systems: A survey of techniques from circuits to software

A Rahimi, L Benini, RK Gupta - Proceedings of the IEEE, 2016 - ieeexplore.ieee.org
Variation in performance and power across manufactured parts and their operating
conditions is an accepted reality in modern microelectronic manufacturing processes with …

Bugnet: Continuously recording program execution for deterministic replay debugging

S Narayanasamy, G Pokam… - … Symposium on Computer …, 2005 - ieeexplore.ieee.org
Significant time is spent by companies trying to reproduce and fix the bugs that occur for
released code. To assist developers, we propose the BugNet architecture to continuously …