[BUCH][B] CUDA application design and development

R Farber - 2011 - books.google.com
As the computer industry retools to leverage massively parallel graphics processing units
(GPUs), this book is designed to meet the needs of working software developers who need …

Verified instruction-level energy consumption measurement for nvidia gpus

Y Arafa, A ElWazir, A ElKanishy, Y Aly… - Proceedings of the 17th …, 2020 - dl.acm.org
GPUs are prevalent in modern computing systems at all scales. They consume a significant
fraction of the energy in these systems. However, vendors do not publish the actual cost of …

High-performance matrix-matrix multiplications of very small matrices

I Masliah, A Abdelfattah, A Haidar, S Tomov… - Euro-Par 2016: Parallel …, 2016 - Springer
The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for
obtaining high performance in many scientific computing applications. GEMMs for small …

Investigating power cap** toward energy‐efficient scientific applications

A Haidar, H Jagode, P Vaccaro… - Concurrency and …, 2019 - Wiley Online Library
The emergence of power efficiency as a primary constraint in processor and system design
poses new challenges concerning power and energy awareness for numerical libraries and …

Cudaadvisor: Llvm-based runtime profiling for modern gpus

D Shen, SL Song, A Li, X Liu - … of the 2018 International Symposium on …, 2018 - dl.acm.org
General-purpose GPUs have been widely utilized to accelerate parallel applications. Given
a relatively complex programming model and fast architecture evolution, producing efficient …

An Empirical Study of High Performance Computing (HPC) Performance Bugs

MAK Azad, N Iqbal, F Hassan… - 2023 IEEE/ACM 20th …, 2023 - ieeexplore.ieee.org
Performance efficiency and scalability are the major design goals for high performance
computing (HPC) applications. However, it is challenging to achieve high efficiency and …

Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications

Z Zhong, V Rychkov… - 2012 IEEE international …, 2012 - ieeexplore.ieee.org
Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …

Tools for top-down performance analysis of GPU-accelerated applications

K Zhou, MW Krentel, J Mellor-Crummey - Proceedings of the 34th ACM …, 2020 - dl.acm.org
This paper describes extensions to Rice University's HPCToolkit performance tools to
support measurement and analysis of GPU-accelerated applications. To help developers …

Autotuning GPU kernels via static and predictive analysis

R Lim, B Norris, A Malony - 2017 46th international conference …, 2017 - ieeexplore.ieee.org
Optimizing the performance of GPU kernels is challenging for both human programmers and
code generators. For example, CUDA programmers must set thread and block parameters …

Measurement and analysis of GPU-accelerated applications with HPCToolkit

K Zhou, L Adhianto, J Anderson, A Cherian… - Parallel Computing, 2021 - Elsevier
To address the challenge of performance analysis on the US DOE's forthcoming exascale
supercomputers, Rice University has been extending its HPCToolkit performance tools to …