A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing

Z Yu, Z Bei, X Qian - Proceedings of the Twenty-Third International …, 2018 - dl.acm.org
In-Memory cluster Computing (IMC) frameworks (eg, Spark) have become increasingly
important because they typically achieve more than 10× speedups over the traditional On …

Mronline: Mapreduce online performance tuning

M Li, L Zeng, S Meng, J Tan, L Zhang, AR Butt… - Proceedings of the 23rd …, 2014 - dl.acm.org
MapReduce job parameter tuning is a daunting and time consuming task. The parameter
configuration space is huge; there are more than 70 parameters that impact job …

RFHOC: A random-forest approach to auto-tuning hadoop's configuration

Z Bei, Z Yu, H Zhang, W **ong, C Xu… - … on Parallel and …, 2015 - ieeexplore.ieee.org
Hadoop is a widely-used implementation framework of the MapReduce programming model
for large-scale data processing. Hadoop performance however is significantly affected by …

Memtune: Dynamic memory management for in-memory data analytic platforms

L Xu, M Li, L Zhang, AR Butt, Y Wang… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
Memory is a crucial resource for big data processing frameworks such as Spark and M3R,
where the memory is used both for computation and for caching intermediate storage data …

Towards automatic parameter tuning of stream processing systems

M Bilal, M Canini - Proceedings of the 2017 Symposium on Cloud …, 2017 - dl.acm.org
Optimizing the performance of big-data streaming applications has become a daunting and
time-consuming task: parameters may be tuned from a space of hundreds or even …

To tune or not to tune? in search of optimal configurations for data analytics

A Fekry, L Carata, T Pasquier, A Rice… - Proceedings of the 26th …, 2020 - dl.acm.org
This experimental study presents a number of issues that pose a challenge for practical
configuration tuning and its deployment in data analytics frameworks. These issues include …

Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads

A Mahgoub, P Wood, S Ganesh, S Mitra… - Proceedings of the 18th …, 2017 - dl.acm.org
High performance computing (HPC) applications, such as metagenomics and other big data
systems, need to store and analyze huge volumes of semi-structured data. Such …

Kea: Tuning an exabyte-scale data infrastructure

Y Zhu, S Krishnan, K Karanasos, I Tarte… - Proceedings of the …, 2021 - dl.acm.org
Microsoft's internal big-data infrastructure is one of the largest in the world---with over 300k
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …

Gml: efficiently auto-tuning flink's configurations via guided machine learning

Y Guo, H Shan, S Huang, K Hwang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The increasingly popular fused batch-streaming big data framework, Apache Flink, has
many performance-critical as well as untamed configuration parameters. However, how to …