A survey on compiler autotuning using machine learning

AH Ashouri, W Killian, J Cavazos, G Palermo… - ACM Computing …, 2018 - dl.acm.org
Since the mid-1990s, researchers have been trying to use machine-learning-based
approaches to solve a number of different compiler optimization problems. These …

Deep configuration performance learning: A systematic survey and taxonomy

J Gong, T Chen - ACM Transactions on Software Engineering and …, 2024 - dl.acm.org
Performance is arguably the most crucial attribute that reflects the quality of a configurable
software system. However, given the increasing scale and complexity of modern software …

End-to-end deep learning of optimization heuristics

C Cummins, P Petoumenos, Z Wang… - 2017 26th …, 2017 - ieeexplore.ieee.org
Accurate automatic optimization heuristics are necessary for dealing with thecomplexity and
diversity of modern hardware and software. Machine learning is aproven technique for …

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

RB Roy, T Patel, V Gadepally, D Tiwari - Proceedings of the 42nd ACM …, 2021 - dl.acm.org
As parallel applications become more complex, auto-tuning becomes more desirable,
challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning …

Asymo: scalable and efficient deep-learning inference on asymmetric mobile cpus

M Wang, S Ding, T Cao, Y Liu, F Xu - Proceedings of the 27th Annual …, 2021 - dl.acm.org
On-device deep learning (DL) inference has attracted vast interest. Mobile CPUs are the
most common hardware for on-device inference and many inference frameworks have been …

Synthesizing benchmarks for predictive modeling

C Cummins, P Petoumenos, Z Wang… - 2017 IEEE/ACM …, 2017 - ieeexplore.ieee.org
Predictive modeling using machine learning is an effective method for building compiler
heuristics, but there is a shortage of benchmarks. Typical machine learning experiments …

CLBlast: A tuned OpenCL BLAS library

C Nugteren - Proceedings of the International Workshop on OpenCL, 2018 - dl.acm.org
This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL
routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at …

[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier
A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …

A benchmark set of highly-efficient CUDA and OpenCL kernels and its dynamic autotuning with Kernel Tuning Toolkit

F Petrovič, D Střelák, J Hozzová, J Ol'ha… - Future Generation …, 2020 - Elsevier
In recent years, the heterogeneity of both commodity and supercomputers hardware has
increased sharply. Accelerators, such as GPUs or Intel Xeon Phi co-processors, are often …

Romou: Rapidly generate high-performance tensor kernels for mobile gpus

R Liang, T Cao, J Wen, M Wang, Y Wang… - Proceedings of the 28th …, 2022 - dl.acm.org
Mobile GPU, as a ubiquitous and powerful accelerator, plays an important role in
accelerating on-device DNN (Deep Neural Network) inference. The frequent-upgrade and …