Big Data with Cloud Computing: an insight on the computing environment, MapReduce, and programming frameworks

A Fernández, S del Río, V López… - … : Data Mining and …, 2014 - Wiley Online Library
The term 'Big Data'has spread rapidly in the framework of Data Mining and Business
Intelligence. This new scenario can be defined by means of those problems that cannot be …

An analysis of the current status and countermeasures of bike-sharing in the background of Internet

X Gao, S Zhao, S Yibo - 2018 International Conference on …, 2018 - ieeexplore.ieee.org
With the continuous rapid growth of China's overall economy, the role of urban transport in
social and economic development has become increasingly significant. However, it also …

Toward an optimal online checkpoint solution under a two-level HPC checkpoint model

S Di, Y Robert, F Vivien… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
The traditional single-level checkpointing method suffers from significant overhead on large-
scale platforms. Hence, multilevel checkpointing protocols have been studied extensively in …

A utilization model for optimization of checkpoint intervals in distributed stream processing systems

S Jayasekara, A Harwood, S Karunasekera - Future Generation Computer …, 2020 - Elsevier
State-of-the-art distributed stream processing systems such as Apache Flink and Storm have
recently included checkpointing to provide fault-tolerance for stateful applications. This is a …

Unified model for assessing checkpointing protocols at extreme‐scale

G Bosilca, A Bouteiller, E Brunet… - Concurrency and …, 2014 - Wiley Online Library
In this paper, we present a unified model for several well‐known checkpoint/restart
protocols. The proposed model is generic enough to encompass both extremes of the …

Multi-phase task-based HPC applications: Quickly learning how to run fast

LL Nesi, LM Schnorr, A Legrand - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Parallel applications performance strongly depends on the number of resources. Although
adding new nodes usually reduces execution time, excessive amounts are often detrimental …

Optimizing checkpoint‐based fault‐tolerance in distributed stream processing systems: Theory to practice

S Jayasekara, S Karunasekera… - Software: Practice and …, 2022 - Wiley Online Library
Fault‐tolerance is an essential part of a stream processing system that guarantees data
analysis could continue even after failures. State‐of‐the‐art distributed stream processing …

Research on optimal checkpointing-interval for flink stream processing applications

Z Zhang, W Li, X Qing, X Liu, H Liu - Mobile Networks and Applications, 2021 - Springer
Nowadays various distributed stream processing systems (DSPSs) are employed to process
the ever-expanding real-time data. The DSPSs are highly susceptible to system failure, and …

Improving checkpointing intervals by considering individual job failure probabilities

A Frank, M Baumgartner, R Salkhordeh… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
Checkpointing is a popular resilience method in HPC and its efficiency highly depends on
the choice of the checkpoint interval. Standard analytical approaches optimize intervals for …

Unified fault-tolerance framework for hybrid task-parallel message-passing applications

O Subasi, T Martsinkevich… - … Journal of High …, 2018 - journals.sagepub.com
We present a unified fault-tolerance framework for task-parallel message-passing
applications to mitigate transient errors. First, we propose a fault-tolerant message-logging …