Assessing dependability with software fault injection: A survey

R Natella, D Cotroneo, HS Madeira - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
With the rise of software complexity, software-related accidents represent a significant threat
for computer-based systems. Software Fault Injection is a method to anticipate worst-case …

Failure diagnosis for distributed systems using targeted fault injection

C Pham, L Wang, BC Tak, S Baset… - … on Parallel and …, 2016 - ieeexplore.ieee.org
This paper introduces a novel approach to automating failure diagnostics in distributed
systems by combining fault injection and data analytics. We use fault injection to populate …

Distributed job manager recovery

JR Challenger, LR Degenaro, JR Giles… - US Patent …, 2010 - Google Patents
(57) ABSTRACT A method is provided for the recovery of an instance of a job manager
running on one of a plurality of nodes used to execute the processing elements associated …

Fault injection analytics: A novel approach to discover failure modes in cloud-computing systems

D Cotroneo, L De Simone, P Liguori… - IEEE transactions on …, 2020 - ieeexplore.ieee.org
Cloud computing systems fail in complex and unexpected ways due to unexpected
combinations of events and interactions between hardware and software components. Fault …

Enhancing the analysis of software failures in cloud computing systems with deep learning

D Cotroneo, L De Simone, P Liguori… - Journal of Systems and …, 2021 - Elsevier
Identifying the failure modes of cloud computing systems is a difficult and time-consuming
task, due to the growing complexity of such systems, and the large volume and noisiness of …

EDFI: A dependable fault injection tool for dependability benchmarking experiments

C Giuffrida, A Kuijsten… - 2013 IEEE 19th Pacific …, 2013 - ieeexplore.ieee.org
Fault injection is a pivotal technique in dependability benchmarking. Unfortunately, existing
general-purpose fault injection tools either inject faults in predetermined memory locations …

No pain, no gain? the utility of parallel fault injections

S Winter, O Schwahn, R Natella, N Suri… - 2015 IEEE/ACM 37th …, 2015 - ieeexplore.ieee.org
Software Fault Injection (SFI) is an established technique for assessing the robustness of a
software under test by exposing it to faults in its operational environment. Depending on the …

Jgroup/ARM: a distributed object group platform with autonomous replication management

H Meling, A Montresor, BE Helvik… - Software: Practice and …, 2008 - Wiley Online Library
This paper presents the design and implementation of Jgroup/ARM, a distributed object
group platform with autonomous replication management along with a novel measurement …

Method for testing the fault tolerance of MapReduce frameworks

JE Marynowski, AO Santin, AR Pimentel - Computer Networks, 2015 - Elsevier
A MapReduce framework abstracts distributed system issues, integrating a distributed file
system with an application's needs. However, the lack of determinism in distributed system …

Towards autonomic fault recovery in system-s

G Jacques-Silva, J Challenger… - … Computing (ICAC'07 …, 2007 - ieeexplore.ieee.org
System-S is a stream processing infrastructure which enables program fragments to be
distributed and connected to form complex applications. There may be potentially tens of …