[BUCH][B] CUDA application design and development
R Farber - 2011 - books.google.com
As the computer industry retools to leverage massively parallel graphics processing units
(GPUs), this book is designed to meet the needs of working software developers who need …
(GPUs), this book is designed to meet the needs of working software developers who need …
Verified instruction-level energy consumption measurement for nvidia gpus
GPUs are prevalent in modern computing systems at all scales. They consume a significant
fraction of the energy in these systems. However, vendors do not publish the actual cost of …
fraction of the energy in these systems. However, vendors do not publish the actual cost of …
High-performance matrix-matrix multiplications of very small matrices
The use of the general dense matrix-matrix multiplication (GEMM) is fundamental for
obtaining high performance in many scientific computing applications. GEMMs for small …
obtaining high performance in many scientific computing applications. GEMMs for small …
Investigating power cap** toward energy‐efficient scientific applications
The emergence of power efficiency as a primary constraint in processor and system design
poses new challenges concerning power and energy awareness for numerical libraries and …
poses new challenges concerning power and energy awareness for numerical libraries and …
Cudaadvisor: Llvm-based runtime profiling for modern gpus
General-purpose GPUs have been widely utilized to accelerate parallel applications. Given
a relatively complex programming model and fast architecture evolution, producing efficient …
a relatively complex programming model and fast architecture evolution, producing efficient …
An Empirical Study of High Performance Computing (HPC) Performance Bugs
Performance efficiency and scalability are the major design goals for high performance
computing (HPC) applications. However, it is challenging to achieve high efficiency and …
computing (HPC) applications. However, it is challenging to achieve high efficiency and …
Data partitioning on heterogeneous multicore and multi-GPU systems using functional performance models of data-parallel applications
Z Zhong, V Rychkov… - 2012 IEEE international …, 2012 - ieeexplore.ieee.org
Transition to hybrid CPU/GPU platforms in high performance computing is challenging in the
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …
aspect of efficient utilisation of the heterogeneous hardware and existing optimised software …
Tools for top-down performance analysis of GPU-accelerated applications
K Zhou, MW Krentel, J Mellor-Crummey - Proceedings of the 34th ACM …, 2020 - dl.acm.org
This paper describes extensions to Rice University's HPCToolkit performance tools to
support measurement and analysis of GPU-accelerated applications. To help developers …
support measurement and analysis of GPU-accelerated applications. To help developers …
Autotuning GPU kernels via static and predictive analysis
Optimizing the performance of GPU kernels is challenging for both human programmers and
code generators. For example, CUDA programmers must set thread and block parameters …
code generators. For example, CUDA programmers must set thread and block parameters …
Measurement and analysis of GPU-accelerated applications with HPCToolkit
To address the challenge of performance analysis on the US DOE's forthcoming exascale
supercomputers, Rice University has been extending its HPCToolkit performance tools to …
supercomputers, Rice University has been extending its HPCToolkit performance tools to …