Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers

A Haidar, S Tomov, J Dongarra… - … Conference for High …, 2018 - ieeexplore.ieee.org
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing
applications, especially those in artificial intelligence. Here, we present an investigation …

Measuring energy and power with PAPI

VM Weaver, M Johnson… - 2012 41st …, 2012 - ieeexplore.ieee.org
Energy and power consumption are becoming critical metrics in the design and usage of
high performance systems. We have extended the Performance API (PAPI) analysis library …

A survey on techniques for cooperative CPU-GPU computing

K Raju, NN Chiplunkar - Sustainable Computing: Informatics and Systems, 2018 - Elsevier
Abstract Graphical Processing Unit provides massive parallelism due to the presence of
hundreds of cores. Usage of GPUs for general purpose computation (GPGPU) has resulted …

Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor

A Heinecke, K Vaidyanathan… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Dense linear algebra has been traditionally used to evaluate the performance and efficiency
of new architectures. This trend has continued for the past half decade with the advent of …

The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale

J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek… - SIAM review, 2018 - SIAM
The computation of the singular value decomposition, or SVD, has a long history with many
improvements over the years, both in its implementations and algorithmically. Here, we …

Simulating low precision floating-point arithmetic

NJ Higham, S Pranesh - SIAM Journal on Scientific Computing, 2019 - SIAM
The half-precision (fp16) floating-point format, defined in the 2008 revision of the IEEE
standard for floating-point arithmetic, and a more recently proposed half-precision format …

[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner

B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier
A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …

Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems

A Haidar, H Bayraktar, S Tomov… - … of the Royal …, 2020 - royalsocietypublishing.org
Double-precision floating-point arithmetic (FP64) has been the de facto standard for
engineering and scientific simulations for several decades. Problem complexity and the …

A hybridization methodology for high-performance linear algebra software for GPUs

E Agullo, C Augonnet, J Dongarra, H Ltaief… - GPU Computing Gems …, 2012 - Elsevier
Publisher Summary This chapter presents a hybridization methodology for the development
of high-performance linear algebra software for graphics processing units (GPUs). The …