Task scheduling in big data platforms: a systematic literature review

M Soualhia, F Khomh, S Tahar - Journal of Systems and Software, 2017‏ - Elsevier
Abstract Context: Hadoop, Spark, Storm, and Mesos are very well known frameworks in both
research and industrial communities that allow expressing and processing distributed …

Energy-aware scheduling of mapreduce jobs for big data applications

L Mashayekhy, MM Nejad, D Grosu… - IEEE transactions on …, 2014‏ - ieeexplore.ieee.org
The majority of large-scale data intensive applications executed by data centers are based
on MapReduce or its open-source implementation, Hadoop. Such applications are executed …

A dynamic and failure-aware task scheduling framework for hadoop

M Soualhia, F Khomh, S Tahar - IEEE Transactions on Cloud …, 2018‏ - ieeexplore.ieee.org
Hadoop has become a popular framework for processing data-intensive applications in
cloud environments. A core constituent of Hadoop is the scheduler, which is responsible for …

A survey of big data machine learning applications optimization in cloud data centers and networks

SH Mohamed, TEH El-Gorashi… - arxiv preprint arxiv …, 2019‏ - arxiv.org
This survey article reviews the challenges associated with deploying and optimizing big data
applications and machine learning algorithms in cloud data centers and networks. The …

Apache hadoop yarn parameter configuration challenges and optimization

BJ Mathiya, VL Desai - 2015 International Conference on Soft …, 2015‏ - ieeexplore.ieee.org
Apache Hadoop Yarn is an open source framework for distributed as well as local storage,
processing and analysis of big data on commodity hardware. It provides MapReduce …

An efficient MapReduce scheduling scheme for processing large multimedia data

K Bok, J Hwang, J Lim, Y Kim, J Yoo - Multimedia Tools and Applications, 2017‏ - Springer
In this paper, we propose a scheduling scheme to minimize the deadline miss of jobs to
which deadlines are assigned when processing large multimedia data such as video and …

Incremental FP-Growth mining strategy for dynamic threshold value and database based on MapReduce

X Wei, Y Ma, F Zhang, M Liu… - Proceedings of the 2014 …, 2014‏ - ieeexplore.ieee.org
With the coming of the Big Data era, data mining has been confronted with new
opportunities and challenges. Some limitations are exposed when traditional association …

Performance modeling for I/O‐intensive applications on virtual machines

T Bhattacharya, X Peng, J Mao, C Zhang… - Concurrency and …, 2022‏ - Wiley Online Library
Abstract Models for virtual machines running on cloud computing systems. Modeling system
behaviors of clouds is a grand challenge because the resource utilization in VMs is …

An enhanced data-locality-aware task scheduling algorithm for hadoop applications

D Choi, M Jeon, N Kim, BD Lee - IEEE Systems Journal, 2017‏ - ieeexplore.ieee.org
In general, Hadoop improves the task scheduling performance by determining data locality
based on the location in which the input splits and MapTask are executed. However, if an …

A requirements specification framework for big data collection and capture

N Al-Najran, A Dahanayake - New Trends in Databases and Information …, 2015‏ - Springer
The ad hoc processes of data gathering used by most organizations nowadays are proving
to be inadequate in a world that is expanding with infinite information. As a consequence …