Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Harnessing GPU tensor cores for fast FP16 arithmetic to speed up mixed-precision iterative refinement solvers
Low-precision floating-point arithmetic is a powerful tool for accelerating scientific computing
applications, especially those in artificial intelligence. Here, we present an investigation …
applications, especially those in artificial intelligence. Here, we present an investigation …
Measuring energy and power with PAPI
VM Weaver, M Johnson… - 2012 41st …, 2012 - ieeexplore.ieee.org
Energy and power consumption are becoming critical metrics in the design and usage of
high performance systems. We have extended the Performance API (PAPI) analysis library …
high performance systems. We have extended the Performance API (PAPI) analysis library …
A survey on techniques for cooperative CPU-GPU computing
Abstract Graphical Processing Unit provides massive parallelism due to the presence of
hundreds of cores. Usage of GPUs for general purpose computation (GPGPU) has resulted …
hundreds of cores. Usage of GPUs for general purpose computation (GPGPU) has resulted …
Design and implementation of the linpack benchmark for single and multi-node systems based on intel® xeon phi coprocessor
A Heinecke, K Vaidyanathan… - 2013 IEEE 27th …, 2013 - ieeexplore.ieee.org
Dense linear algebra has been traditionally used to evaluate the performance and efficiency
of new architectures. This trend has continued for the past half decade with the advent of …
of new architectures. This trend has continued for the past half decade with the advent of …
The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale
The computation of the singular value decomposition, or SVD, has a long history with many
improvements over the years, both in its implementations and algorithmically. Here, we …
improvements over the years, both in its implementations and algorithmically. Here, we …
Simulating low precision floating-point arithmetic
NJ Higham, S Pranesh - SIAM Journal on Scientific Computing, 2019 - SIAM
The half-precision (fp16) floating-point format, defined in the 2008 revision of the IEEE
standard for floating-point arithmetic, and a more recently proposed half-precision format …
standard for floating-point arithmetic, and a more recently proposed half-precision format …
[HTML][HTML] Kernel Tuner: A search-optimizing GPU code auto-tuner
B van Werkhoven - Future Generation Computer Systems, 2019 - Elsevier
A very common problem in GPU programming is that some combination of thread block
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …
dimensions and other code optimization parameters, like tiling or unrolling factors, results in …
Mixed-precision iterative refinement using tensor cores on GPUs to accelerate solution of linear systems
Double-precision floating-point arithmetic (FP64) has been the de facto standard for
engineering and scientific simulations for several decades. Problem complexity and the …
engineering and scientific simulations for several decades. Problem complexity and the …
A hybridization methodology for high-performance linear algebra software for GPUs
Publisher Summary This chapter presents a hybridization methodology for the development
of high-performance linear algebra software for graphics processing units (GPUs). The …
of high-performance linear algebra software for graphics processing units (GPUs). The …