Reliability and energy efficiency in cloud computing systems: Survey and taxonomy

Y Sharma, B Javadi, W Si, D Sun - Journal of Network and Computer …, 2016 - Elsevier
With the popularity of cloud computing, it has become crucial to provide on-demand services
dynamically according to the user's requirements. Reliability and energy efficiency are two …

Task failure prediction in cloud data centers using deep learning

J Gao, H Wang, H Shen - IEEE transactions on services …, 2020 - ieeexplore.ieee.org
A large-scale cloud data center needs to provide high service reliability and availability with
low failure occurrence probability. However, current large-scale cloud data centers still face …

Failure-aware resource provisioning for hybrid cloud infrastructure

B Javadi, J Abawajy, R Buyya - Journal of parallel and distributed …, 2012 - Elsevier
Hybrid Cloud computing is receiving increasing attention in recent days. In order to realize
the full potential of the hybrid Cloud platform, an architectural framework for efficiently …

CloudPD: Problem determination and diagnosis in shared dynamic clouds

B Sharma, P Jayachandran, A Verma… - 2013 43rd Annual …, 2013 - ieeexplore.ieee.org
In this work, we address problem determination in virtualized clouds. We show that high
dynamism, resource sharing, frequent reconfiguration, high propensity to faults and …

The Failure Trace Archive: Enabling the comparison of failure measurements and models of distributed systems

B Javadi, D Kondo, A Iosup, D Epema - Journal of Parallel and Distributed …, 2013 - Elsevier
With the increasing presence, scale, and complexity of distributed systems, resource failures
are becoming an important and practical topic of computer science research. While …

Control-based load-balancing techniques: Analysis and performance evaluation via a randomized optimization approach

AV Papadopoulos, C Klein, M Maggio… - Control Engineering …, 2016 - Elsevier
Cloud applications are often subject to unexpected events like flashcrowds and hardware
failures. Users that expect a predictable behavior may abandon an unresponsive application …

Modeling stochastic correlated failures and their effects on network reliability

M Rahnamay-Naeini, JE Pezoa, G Azar… - 2011 Proceedings of …, 2011 - ieeexplore.ieee.org
The physical infrastructure of communication networks is vulnerable to spatially correlated
failures arising from various physical stresses such as natural disasters (earthquakes and …

Dependency mining for service resilience at the edge

A Aral, I Brandic - 2018 IEEE/ACM Symposium on Edge …, 2018 - ieeexplore.ieee.org
Edge computing paradigm is prone to failures as it trades reliability against other quality of
service properties such as low latency and geographical prevalence. Therefore, software …

Analysis and modeling of time-correlated failures in large-scale distributed systems

N Yigitbasi, M Gallet, D Kondo, A Iosup… - 2010 11th IEEE/ACM …, 2010 - ieeexplore.ieee.org
The analysis and modeling of the failures bound to occur in today's large-scale production
systems is invaluable in providing the understanding needed to make these systems fault …

On evaluating self-adaptive and self-healing systems using chaos engineering

MA Naqvi, S Malik, M Astekin… - 2022 IEEE international …, 2022 - ieeexplore.ieee.org
With the growing adoption of self-adaptive systems in various domains, there is an
increasing need for strategies to assess their correct behavior. In particular self-healing …