A survey on automatic parameter tuning for big data processing systems
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …
configuration parameters controlling parallelism, I/O behavior, memory settings, and …
Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing
Z Yu, Z Bei, X Qian - Proceedings of the Twenty-Third International …, 2018 - dl.acm.org
In-Memory cluster Computing (IMC) frameworks (eg, Spark) have become increasingly
important because they typically achieve more than 10× speedups over the traditional On …
important because they typically achieve more than 10× speedups over the traditional On …
Mronline: Mapreduce online performance tuning
MapReduce job parameter tuning is a daunting and time consuming task. The parameter
configuration space is huge; there are more than 70 parameters that impact job …
configuration space is huge; there are more than 70 parameters that impact job …
RFHOC: A random-forest approach to auto-tuning hadoop's configuration
Z Bei, Z Yu, H Zhang, W **ong, C Xu… - … on Parallel and …, 2015 - ieeexplore.ieee.org
Hadoop is a widely-used implementation framework of the MapReduce programming model
for large-scale data processing. Hadoop performance however is significantly affected by …
for large-scale data processing. Hadoop performance however is significantly affected by …
Memtune: Dynamic memory management for in-memory data analytic platforms
Memory is a crucial resource for big data processing frameworks such as Spark and M3R,
where the memory is used both for computation and for caching intermediate storage data …
where the memory is used both for computation and for caching intermediate storage data …
Towards automatic parameter tuning of stream processing systems
Optimizing the performance of big-data streaming applications has become a daunting and
time-consuming task: parameters may be tuned from a space of hundreds or even …
time-consuming task: parameters may be tuned from a space of hundreds or even …
To tune or not to tune? in search of optimal configurations for data analytics
This experimental study presents a number of issues that pose a challenge for practical
configuration tuning and its deployment in data analytics frameworks. These issues include …
configuration tuning and its deployment in data analytics frameworks. These issues include …
Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads
High performance computing (HPC) applications, such as metagenomics and other big data
systems, need to store and analyze huge volumes of semi-structured data. Such …
systems, need to store and analyze huge volumes of semi-structured data. Such …
Kea: Tuning an exabyte-scale data infrastructure
Microsoft's internal big-data infrastructure is one of the largest in the world---with over 300k
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …
machines running billions of tasks from over 0.6 M daily jobs. Operating this infrastructure is …
Gml: efficiently auto-tuning flink's configurations via guided machine learning
Y Guo, H Shan, S Huang, K Hwang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The increasingly popular fused batch-streaming big data framework, Apache Flink, has
many performance-critical as well as untamed configuration parameters. However, how to …
many performance-critical as well as untamed configuration parameters. However, how to …