[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier
A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …

GPGPU performance estimation with core and memory frequency scaling

Q Wang, X Chu - IEEE Transactions on Parallel and Distributed …, 2020 - ieeexplore.ieee.org
Contemporary graphics processing units (GPUs) support dynamic voltage and frequency
scaling to balance computational performance and energy consumption. However, accurate …

Bayesian Optimization for auto-tuning GPU kernels

FJ Willemsen, R van Nieuwpoort… - … and Simulation of …, 2021 - ieeexplore.ieee.org
Finding optimal parameter configurations for tunable GPU kernels is a non-trivial exercise
for large search spaces, even when automated. This poses an optimization task on a …

Benchmarking optimization algorithms for auto-tuning GPU kernels

RA Schoonhoven, B van Werkhoven… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Recent years have witnessed phenomenal growth in the application, and capabilities of
graphical processing units (GPUs) due to their high parallel computation power at relatively …

Going green: optimizing GPUs for energy efficiency through model-steered auto-tuning

R Schoonhoven, B Veenboer… - 2022 IEEE/ACM …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have revolutionized the computing landscape over the
past decade. However, the growing energy demands of data centres and computing …

cstuner: Scalable auto-tuning framework for complex stencil computation on gpus

Q Sun, Y Liu, H Yang, Z Jiang, X Liu… - 2021 IEEE …, 2021 - ieeexplore.ieee.org
The computational patterns of stencil operations are commonly used in HPC applications.
Many HPC platforms utilize the computation capability of GPUs to accelerate stencil …

LS-CAT: a large-scale CUDA AutoTuning dataset

L Bjertnes, JO Tørring, AC Elster - … International Conference on …, 2021 - ieeexplore.ieee.org
The effectiveness of Machine Learning (ML) methods depend on access to large suitable
datasets. In this article, we present how we build the LS-CAT (Large-Scale CUDA …

Adaptive Auto-Tuning Framework for Global Exploration of Stencil Optimization on GPUs

Q Sun, Y Liu, H Yang, Z Jiang, Z Luan… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Stencil computations are widely used in high performance computing (HPC) applications.
Many HPC platforms utilize the high computation capability of GPUs to accelerate stencil …

Coding Ants: Optimization of GPU code using ant colony optimization

E Papenhausen, K Mueller - Computer Languages, Systems & Structures, 2018 - Elsevier
This article proposes the Coding Ants framework, an approach for auto-tuning which uses
ant colony optimization to find a sequence of code optimizations for GPU architectures. The …

Optimal Kernel Tuning Parameter Prediction using Deep Sequence Models

K Mahmood, J Khan, H Afzal - arxiv preprint arxiv:2404.10162, 2024 - arxiv.org
GPU kernels have come to the forefront of comput-ing due to their utility in varied fields, from
high-performance computing to machine learning. A typical GPU compute kernel is invoked …