Software fault tolerance: A tutorial

W Torres-Pomales - 2000 - ntrs.nasa.gov
Because of our present inability to produce error-free software, software fault tolerance is
and will continue to be an important consideration in software systems. The root cause of …

Proactive management of software aging

V Castelli, RE Harper, P Heidelberger… - IBM Journal of …, 2001 - ieeexplore.ieee.org
Software failures are now known to be a dominant source of system outages. Several
studies and much anecdotal evidence point to “software aging” as a common phenomenon …

Failure data analysis of a large-scale heterogeneous server environment

RK Sahoo, MS Squillante… - … and Networks, 2004, 2004 - ieeexplore.ieee.org
The growing complexity of hardware and software mandates the recognition of fault
occurrence in system deployment and management. While there are several techniques to …

SHARPE at the age of twenty two

KS Trivedi, R Sahner - ACM SIGMETRICS Performance Evaluation …, 2009 - dl.acm.org
This paper discusses the modeling tool called SHARPE (Symbolic Hierarchical Automated
Reliability and Performance Evaluator), a general hierarchical modeling tool that analyzes …

An empirical failure-analysis of a large-scale cloud computing environment

P Garraghan, P Townend, J Xu - 2014 IEEE 15th International …, 2014 - ieeexplore.ieee.org
Cloud computing research is in great need of statistical parameters derived from the
analysis of real-world systems. One aspect of this is the failure characteristics of Cloud …

Performance implications of failures in large-scale cluster scheduling

Y Zhang, MS Squillante, A Sivasubramaniam… - … Strategies for Parallel …, 2005 - Springer
As we continue to evolve into large-scale parallel systems, many of them employing
hundreds of computing engines to take on mission-critical roles, it is crucial to design those …

Software fault tolerance: A tutorial

T Wilfredo - 2000 - dl.acm.org
Because of our present inability to produce error-free software, software fault tolerance is
and will continue to be an important consideration in software systems. The root cause of …

Efficient pattern-based time series classification on GPU

KW Chang, B Deka, WMW Hwu… - 2012 IEEE 12th …, 2012 - ieeexplore.ieee.org
Time series shapelet discovery algorithm finds subsequences from a set of time series for
use as primitives for time series classification. This algorithm has drawn a lot of interest …

A model for availability analysis of distributed software/hardware systems

CD Lai, M **e, KL Poh, YS Dai, P Yang - Information and software …, 2002 - Elsevier
System availability is a major performance concern in distributed systems design and
analysis. A typical kind of application on distributed systems has a homogeneously …

Making services fault tolerant

PPW Chan, MR Lyu, M Malek - … , ISAS 2006, Helsinki, Finland, May 15-16 …, 2006 - Springer
With ever growing use of Internet, Web services become increasingly popular and their
growth rate surpasses even the most optimistic predictions. Services are self-descriptive, self …