Self-stabilizing iterative solvers

P Sao, R Vuduc - Proceedings of the workshop on latest advances in …, 2013 - dl.acm.org
We show how to use the idea of self-stabilization, which originates in the context of
distributed control, to make fault-tolerant iterative solvers. Generally, a self-stabilizing system …

RAFTing MapReduce: Fast recovery on the RAFT

JA Quiané-Ruiz, C Pinkel, J Schad… - 2011 IEEE 27th …, 2011 - ieeexplore.ieee.org
MapReduce is a computing paradigm that has gained a lot of popularity as it allows non-
expert users to easily run complex analytical tasks at very large-scale. At such scale, task …

Fail-stop failure algorithm-based fault tolerance for cholesky decomposition

D Hakkarinen, P Wu, Z Chen - IEEE Transactions on Parallel …, 2014 - ieeexplore.ieee.org
Cholesky decomposition is a widely used algorithm to solve linear equations with symmetric
and positive definite coefficient matrix. With large matrices, this often will be performed on …

FlipBack: automatic targeted protection against silent data corruption

X Ni, LV Kale - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
The decreasing size of transistors has been critical to the increase in capacity of
supercomputers. The smaller the transistors are, less energy is required to flip a bit, and thus …

Low-overhead fault-tolerance for the preconditioned conjugate gradient solver

A Schöll, C Braun, MA Kochte… - 2015 IEEE International …, 2015 - ieeexplore.ieee.org
Linear system solvers are an integral part for many different compute-intensive applications
and they benefit from the compute power of heterogeneous computer architectures …

Security In Keykos™

SA Rajunas, N Hardy, AC Bomberger… - … IEEE Symposium on …, 1986 - ieeexplore.ieee.org
KeyKOS™** is a capability-based system which was designed to meet the performance,
reliability, and security goals of the commercial computer service marketplace, KeyKOS's …

Characterization of impact of transient faults and detection of data corruption errors in large-scale n-body programs using graphics processing units

KS Yim - 2014 IEEE 28th International Parallel and Distributed …, 2014 - ieeexplore.ieee.org
In N-body programs, trajectories of simulated particles have chaotic patterns if errors are in
the initial conditions or occur during some computation steps. It was believed that the global …

Cross-layer approaches for an aging-aware design of nanoscale microprocessors: Dissertation summary: IEEE TTTC EJ McCluskey doctoral thesis award competition …

F Oboril, MB Tahoori - 2015 IEEE International Test …, 2015 - ieeexplore.ieee.org
As CMOS technologies enter nanometer scales, maintaining the microprocessor reliability
becomes a major design challenge. In particular, accelerated transistor aging is a serious …

Development and convergence analysis of an effective and robust implicit Euler solver for 3D unstructured grids

DF Cavalca, C Bringhenti, GB Campos… - Journal of …, 2018 - Elsevier
This paper reports the development and convergence analysis in steady-state of an effective
and robust implicit finite-volume solver for compressible Euler equations on three …

Applying efficient fault tolerance to enable the preconditioned conjugate gradient solver on approximate computing hardware

A Schöll, C Braun, HJ Wunderlich - 2016 IEEE International …, 2016 - ieeexplore.ieee.org
A new technique is presented that allows to execute the preconditioned conjugate gradient
(PCG) solver on approximate hardware while ensuring correct solver results. This technique …