A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

MapReduce scheduling algorithms in Hadoop: a systematic study

S Hedayati, N Maleki, T Olsson, F Ahlgren… - Journal of Cloud …, 2023 - Springer
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses
Hadoop Distributed File System (HDFS) for storing data and uses MapReduce to process …

Big data analysis-based secure cluster management for optimized control plane in software-defined networks

J Wu, M Dong, K Ota, J Li… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
In software-defined networks (SDNs), the abstracted control plane is its symbolic
characteristic, whose core component is the software-based controller. The control plane is …

IoTDeM: An IoT Big Data-oriented MapReduce performance prediction extended model in multiple edge clouds

Z Lu, N Wang, J Wu, M Qiu - Journal of Parallel and Distributed Computing, 2018 - Elsevier
Abstract Uploading all IoT Big Data to a centralized cloud for data analytics is infeasible
because of the excessive latency and bandwidth limitation of the Internet. A promising …

Predictive performance modeling for distributed batch processing using black box monitoring and machine learning

C Witt, M Bux, W Gusew, U Leser - Information Systems, 2019 - Elsevier
In many domains, the previous decade was characterized by increasing data volumes and
growing complexity of data analyses, creating new demands for batch processing on …

Towards analyzing the performance of hybrid edge-cloud processing

D Loghin, L Ramapantulu… - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
While edge computing is gaining traction, organizations operating in geographically
distributed locations are still using cloud computing to collect and post-process data. In this …

Transition phase classification and prediction

J Lau, S Schoenmackers… - … Symposium on High …, 2005 - ieeexplore.ieee.org
Most programs are repetitive, where similar behavior can be seen at different execution
times. Proposed on-line systems automatically group these similar intervals of execution into …

Task scheduling in big data platforms: a systematic literature review

M Soualhia, F Khomh, S Tahar - Journal of Systems and Software, 2017 - Elsevier
Abstract Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both
research and industrial communities that allow expressing and processing distributed …

A dynamic and failure-aware task scheduling framework for hadoop

M Soualhia, F Khomh, S Tahar - IEEE Transactions on Cloud …, 2018 - ieeexplore.ieee.org
Hadoop has become a popular framework for processing data-intensive applications in
cloud environments. A core constituent of Hadoop is the scheduler, which is responsible for …

Autotoken: Predicting peak parallelism for big data analytics at microsoft

R Sen, A **dal, H Patel, S Qiao - Proceedings of the VLDB Endowment, 2020 - dl.acm.org
Right-sizing resource allocation for big-data queries, particularly in serverless environments,
is critical for improving infrastructure operational efficiency, capacity availability, query …