Understanding GPU power: A survey of profiling, modeling, and simulation methods
RA Bridges, N Imam, TM Mintz - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Modern graphics processing units (GPUs) have complex architectures that admit
exceptional performance and energy efficiency for high-throughput applications. Although …
exceptional performance and energy efficiency for high-throughput applications. Although …
A performance analysis framework for identifying potential benefits in GPGPU applications
Tuning code for GPGPU and other emerging many-core platforms is a challenge because
few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this …
few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this …
Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance
GPUs have become prevalent and more general purpose, but GPU programming remains
challenging and time consuming for the majority of programmers. In addition, it is not always …
challenging and time consuming for the majority of programmers. In addition, it is not always …
Automated smartnic offloading insights for network functions
The gap between CPU and networking speeds has motivated the development of
SmartNICs for NF (network functions) offloading. However, offloading performance is …
SmartNICs for NF (network functions) offloading. However, offloading performance is …
Scalable kernel fusion for memory-bound GPU applications
M Wahib, N Maruyama - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
GPU implementations of HPC applications relying on finite difference methods can include
tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing …
tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing …
Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs
J Lai, A Seznec - Proceedings of the 2013 IEEE/ACM …, 2013 - ieeexplore.ieee.org
In this paper, we present an approach to estimate GPU applications' performance upper
bound based on algorithm analysis and assembly code level benchmarking. As an example …
bound based on algorithm analysis and assembly code level benchmarking. As an example …
Predicting gpu performance from cpu runs using machine learning
Graphics processing units (GPUs) can deliver considerable performance gains over general
purpose processors. However, GPU performance improvement vary considerably across …
purpose processors. However, GPU performance improvement vary considerably across …
Optimizing CUDA code by kernel fusion: application on BLAS
Contemporary GPUs have significantly higher arithmetic throughput than a memory
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …
A survey of performance modeling and simulation techniques for accelerator-based computing
The high performance computing landscape is shifting from collections of homogeneous
nodes towards heterogeneous systems, in which nodes consist of a combination of …
nodes towards heterogeneous systems, in which nodes consist of a combination of …
Palm: Easing the burden of analytical performance modeling
Analytical (predictive) application performance models are critical for diagnosing
performance-limiting resources, optimizing systems, and designing machines. Creating …
performance-limiting resources, optimizing systems, and designing machines. Creating …