Reliability and energy efficiency in cloud computing systems: Survey and taxonomy
With the popularity of cloud computing, it has become crucial to provide on-demand services
dynamically according to the user's requirements. Reliability and energy efficiency are two …
dynamically according to the user's requirements. Reliability and energy efficiency are two …
Task failure prediction in cloud data centers using deep learning
A large-scale cloud data center needs to provide high service reliability and availability with
low failure occurrence probability. However, current large-scale cloud data centers still face …
low failure occurrence probability. However, current large-scale cloud data centers still face …
Failure-aware resource provisioning for hybrid cloud infrastructure
Hybrid Cloud computing is receiving increasing attention in recent days. In order to realize
the full potential of the hybrid Cloud platform, an architectural framework for efficiently …
the full potential of the hybrid Cloud platform, an architectural framework for efficiently …
CloudPD: Problem determination and diagnosis in shared dynamic clouds
In this work, we address problem determination in virtualized clouds. We show that high
dynamism, resource sharing, frequent reconfiguration, high propensity to faults and …
dynamism, resource sharing, frequent reconfiguration, high propensity to faults and …
The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems
With the increasing presence, scale, and complexity of distributed systems, resource failures
are becoming an important and practical topic of computer science research. While …
are becoming an important and practical topic of computer science research. While …
Control-based load-balancing techniques: Analysis and performance evaluation via a randomized optimization approach
Cloud applications are often subject to unexpected events like flashcrowds and hardware
failures. Users that expect a predictable behavior may abandon an unresponsive application …
failures. Users that expect a predictable behavior may abandon an unresponsive application …
Modeling stochastic correlated failures and their effects on network reliability
The physical infrastructure of communication networks is vulnerable to spatially correlated
failures arising from various physical stresses such as natural disasters (earthquakes and …
failures arising from various physical stresses such as natural disasters (earthquakes and …
Dependency mining for service resilience at the edge
Edge computing paradigm is prone to failures as it trades reliability against other quality of
service properties such as low latency and geographical prevalence. Therefore, software …
service properties such as low latency and geographical prevalence. Therefore, software …
Analysis and modeling of time-correlated failures in large-scale distributed systems
The analysis and modeling of the failures bound to occur in today's large-scale production
systems is invaluable in providing the understanding needed to make these systems fault …
systems is invaluable in providing the understanding needed to make these systems fault …
On evaluating self-adaptive and self-healing systems using chaos engineering
With the growing adoption of self-adaptive systems in various domains, there is an
increasing need for strategies to assess their correct behavior. In particular self-healing …
increasing need for strategies to assess their correct behavior. In particular self-healing …