A survey on automatic parameter tuning for big data processing systems
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …
configuration parameters controlling parallelism, I/O behavior, memory settings, and …
Mronline: Mapreduce online performance tuning
MapReduce job parameter tuning is a daunting and time consuming task. The parameter
configuration space is huge; there are more than 70 parameters that impact job …
configuration space is huge; there are more than 70 parameters that impact job …
Datasize-aware high dimensional configurations auto-tuning of in-memory cluster computing
Z Yu, Z Bei, X Qian - Proceedings of the Twenty-Third International …, 2018 - dl.acm.org
In-Memory cluster Computing (IMC) frameworks (eg, Spark) have become increasingly
important because they typically achieve more than 10× speedups over the traditional On …
important because they typically achieve more than 10× speedups over the traditional On …
RFHOC: A random-forest approach to auto-tuning hadoop's configuration
Z Bei, Z Yu, H Zhang, W **ong, C Xu… - … on Parallel and …, 2015 - ieeexplore.ieee.org
Hadoop is a widely-used implementation framework of the MapReduce programming model
for large-scale data processing. Hadoop performance however is significantly affected by …
for large-scale data processing. Hadoop performance however is significantly affected by …
Memtune: Dynamic memory management for in-memory data analytic platforms
Memory is a crucial resource for big data processing frameworks such as Spark and M3R,
where the memory is used both for computation and for caching intermediate storage data …
where the memory is used both for computation and for caching intermediate storage data …
Towards automatic parameter tuning of stream processing systems
Optimizing the performance of big-data streaming applications has become a daunting and
time-consuming task: parameters may be tuned from a space of hundreds or even …
time-consuming task: parameters may be tuned from a space of hundreds or even …
Rafiki: A middleware for parameter tuning of nosql datastores for dynamic metagenomics workloads
High performance computing (HPC) applications, such as metagenomics and other big data
systems, need to store and analyze huge volumes of semi-structured data. Such …
systems, need to store and analyze huge volumes of semi-structured data. Such …
To tune or not to tune? in search of optimal configurations for data analytics
This experimental study presents a number of issues that pose a challenge for practical
configuration tuning and its deployment in data analytics frameworks. These issues include …
configuration tuning and its deployment in data analytics frameworks. These issues include …
Learning-based automatic parameter tuning for big data analytics frameworks
Big data analytics frameworks (BDAFs) have been widely used for data processing
applications. These frameworks provide a large number of configuration parameters to …
applications. These frameworks provide a large number of configuration parameters to …
Gml: efficiently auto-tuning flink's configurations via guided machine learning
Y Guo, H Shan, S Huang, K Hwang… - IEEE Transactions on …, 2021 - ieeexplore.ieee.org
The increasingly popular fused batch-streaming big data framework, Apache Flink, has
many performance-critical as well as untamed configuration parameters. However, how to …
many performance-critical as well as untamed configuration parameters. However, how to …