MapReduce scheduling algorithms in Hadoop: a systematic study

S Hedayati, N Maleki, T Olsson, F Ahlgren… - Journal of Cloud …, 2023‏ - Springer
Hadoop is a framework for storing and processing huge volumes of data on clusters. It uses
Hadoop Distributed File System (HDFS) for storing data and uses MapReduce to process …

A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020‏ - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

{AGILE}: Elastic distributed resource scaling for {Infrastructure-as-a-Service}

H Nguyen, Z Shen, X Gu, S Subbiah… - … Conference on Autonomic …, 2013‏ - usenix.org
Dynamically adjusting the number of virtual machines (VMs) assigned to a cloud application
to keep up with load changes and interference from other uses typically requires detailed …

IoTDeM: An IoT Big Data-oriented MapReduce performance prediction extended model in multiple edge clouds

Z Lu, N Wang, J Wu, M Qiu - Journal of Parallel and Distributed Computing, 2018‏ - Elsevier
Abstract Uploading all IoT Big Data to a centralized cloud for data analytics is infeasible
because of the excessive latency and bandwidth limitation of the Internet. A promising …

ishuffle: Improving hadoop performance with shuffle-on-write

Y Guo, J Rao, D Cheng, X Zhou - IEEE transactions on parallel …, 2016‏ - ieeexplore.ieee.org
Hadoop is a popular implementation of the MapReduce framework for running data-
intensive jobs on clusters of commodity servers. Shuffle, the all-to-all input data fetching …

Hadoop performance modeling for job estimation and resource provisioning

M Khan, Y **, M Li, Y **ang… - IEEE Transactions on …, 2015‏ - ieeexplore.ieee.org
MapReduce has become a major computing model for data intensive applications. Hadoop,
an open source implementation of MapReduce, has been adopted by an increasingly …

Improving performance of heterogeneous mapreduce clusters with adaptive task tuning

D Cheng, J Rao, Y Guo, C Jiang… - IEEE Transactions on …, 2016‏ - ieeexplore.ieee.org
Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to
continuous server replacement. Meanwhile, datacenters are commonly shared by many …

Optimizing analytic data flows for multiple execution engines

A Simitsis, K Wilkinson, M Castellanos… - Proceedings of the 2012 …, 2012‏ - dl.acm.org
Next generation business intelligence involves data flows that span different execution
engines, contain complex functionality like data/text analytics, machine learning operations …

Heterogeneity and interference-aware virtual machine provisioning for predictable performance in the cloud

F Xu, F Liu, H ** - IEEE Transactions on Computers, 2015‏ - ieeexplore.ieee.org
Infrastructure-as-a-service (IaaS) cloud providers offer tenants elastic computing resources
in the form of virtual machine (VM) instances to run their jobs. Recently, providing …

Cost-effective resource provisioning for mapreduce in a cloud

B Palanisamy, A Singh, L Liu - IEEE Transactions on Parallel …, 2014‏ - ieeexplore.ieee.org
This paper presents a new MapReduce cloud service model, Cura, for provisioning cost-
effective MapReduce services in a cloud. In contrast to existing MapReduce cloud services …