A survey on automatic parameter tuning for big data processing systems

H Herodotou, Y Chen, J Lu - ACM Computing Surveys (CSUR), 2020 - dl.acm.org
Big data processing systems (eg, Hadoop, Spark, Storm) contain a vast number of
configuration parameters controlling parallelism, I/O behavior, memory settings, and …

{CherryPick}: Adaptively unearthing the best cloud configurations for big data analytics

O Alipourfard, HH Liu, J Chen… - … USENIX Symposium on …, 2017 - usenix.org
Picking the right cloud configuration for recurring big data analytics jobs running in clouds is
hard, because there can be tens of possible VM instance types and even more cluster sizes …

Using meta-heuristics and machine learning for software optimization of parallel computing systems: a systematic literature review

S Memeti, S Pllana, A Binotto, J Kołodziej, I Brandic - Computing, 2019 - Springer
While modern parallel computing systems offer high performance, utilizing these powerful
computing resources to the highest possible extent demands advanced knowledge of …

Adapting multi-objectivized software configuration tuning

T Chen, M Li - Proceedings of the ACM on Software Engineering, 2024 - dl.acm.org
When tuning software configuration for better performance (eg, latency or throughput), an
important issue that many optimizers face is the presence of local optimum traps …

Using machine learning to optimize parallelism in big data applications

ÁB Hernández, MS Perez, S Gupta… - Future Generation …, 2018 - Elsevier
In-memory cluster computing platforms have gained momentum in the last years, due to their
ability to analyse big amounts of data in parallel. These platforms are complex and difficult-to …

Do performance aspirations matter for guiding software configuration tuning? an empirical investigation under dual performance objectives

T Chen, M Li - ACM Transactions on Software Engineering and …, 2023 - dl.acm.org
Configurable software systems can be tuned for better performance. Leveraging on some
Pareto optimizers, recent work has shifted from tuning for a single, time-related performance …

Multi-objectivizing software configuration tuning

T Chen, M Li - Proceedings of the 29th ACM Joint Meeting on …, 2021 - dl.acm.org
Automatically tuning software configuration for optimizing a single performance attribute (eg,
minimizing latency) is not trivial, due to the nature of the configuration systems (eg, complex …

Black or white? how to develop an autotuner for memory-based analytics

M Kunjir, S Babu - Proceedings of the 2020 ACM SIGMOD International …, 2020 - dl.acm.org
There is a lot of interest today in building autonomous (or, self-driving) data processing
systems. An emerging school of thought is to leverage AI-driven" black box" algorithms for …

Improving performance of heterogeneous mapreduce clusters with adaptive task tuning

D Cheng, J Rao, Y Guo, C Jiang… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Datacenter-scale clusters are evolving toward heterogeneous hardware architectures due to
continuous server replacement. Meanwhile, datacenters are commonly shared by many …

Memtune: Dynamic memory management for in-memory data analytic platforms

L Xu, M Li, L Zhang, AR Butt, Y Wang… - 2016 IEEE international …, 2016 - ieeexplore.ieee.org
Memory is a crucial resource for big data processing frameworks such as Spark and M3R,
where the memory is used both for computation and for caching intermediate storage data …