A survey of online failure prediction methods

F Salfner, M Lenk, M Malek - ACM Computing Surveys (CSUR), 2010 - dl.acm.org
With the ever-growing complexity and dynamicity of computer systems, proactive fault
management is an effective approach to enhancing availability. Online failure prediction is …

A survey of software aging and rejuvenation studies

D Cotroneo, R Natella, R Pietrantuono… - ACM Journal on …, 2014 - dl.acm.org
Software aging is a phenomenon plaguing many long-running complex software systems,
which exhibit performance degradation or an increasing failure rate. Several strategies …

The fundamentals of software aging

M Grottke, R Matias, KS Trivedi - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Since the notion of software aging was introduced thirteen years ago, the interest in this
phenomenon has been increasing from both academia and industry. The majority of the …

Analysis of software aging in a web server

M Grottke, L Li, K Vaidyanathan… - IEEE Transactions on …, 2006 - ieeexplore.ieee.org
Several recent studies have reported & examined the phenomenon that long-running
software systems show an increasing failure rate and/or a progressive degradation of their …

Predictive reliability and fault management in exascale systems: State of the art and perspectives

R Canal, C Hernandez, R Tornero, A Cilardo… - ACM Computing …, 2020 - dl.acm.org
Performance and power constraints come together with Complementary Metal Oxide
Semiconductor technology scaling in future Exascale systems. Technology scaling makes …

A workload-based analysis of software aging, and rejuvenation

Y Bao, X Sun, KS Trivedi - IEEE Transactions on Reliability, 2005 - ieeexplore.ieee.org
We present a hierarchical model for the analysis of proactive fault management in the
presence of system resource leaks. At the low level of the model hierarchy is a degradation …

An experimental study on software aging and rejuvenation in web servers

R Matias, JF Paulo Filho - 30th Annual International Computer …, 2006 - ieeexplore.ieee.org
Several studies have been conducted in order to understand the'software
aging'phenomenon. This paper presents the results of an experimental research work …

Software aging and rejuvenation: Where we are and where we are going

D Cotroneo, R Natella, R Pietrantuono… - 2011 IEEE Third …, 2011 - ieeexplore.ieee.org
After 16 years, a significant body of knowledge has been established in the area of Software
Aging and Rejuvenation (SAR). In this paper, we survey papers about SAR that appeared in …

Leveraging performance counters and execution logs to diagnose memory-related performance issues

MD Syer, ZM Jiang, M Nagappan… - 2013 IEEE …, 2013 - ieeexplore.ieee.org
Load tests ensure that software systems are able to perform under the expected workloads.
The current state of load test analysis requires significant manual review of performance …

A best practice guide to resource forecasting for computing systems

GA Hoffmann, KS Trivedi… - IEEE Transactions on …, 2007 - ieeexplore.ieee.org
Recently, measurement-based studies of software systems have proliferated, reflecting an
increasingly empirical focus on system availability, reliability, aging, and fault tolerance …