Understanding GPU power: A survey of profiling, modeling, and simulation methods

RA Bridges, N Imam, TM Mintz - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
Modern graphics processing units (GPUs) have complex architectures that admit
exceptional performance and energy efficiency for high-throughput applications. Although …

A performance analysis framework for identifying potential benefits in GPGPU applications

J Sim, A Dasgupta, H Kim, R Vuduc - Proceedings of the 17th ACM …, 2012 - dl.acm.org
Tuning code for GPGPU and other emerging many-core platforms is a challenge because
few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this …

Cross-architecture performance prediction (XAPP) using CPU code to predict GPU performance

N Ardalani, C Lestourgeon, K Sankaralingam… - Proceedings of the 48th …, 2015 - dl.acm.org
GPUs have become prevalent and more general purpose, but GPU programming remains
challenging and time consuming for the majority of programmers. In addition, it is not always …

Automated smartnic offloading insights for network functions

Y Qiu, J **ng, KF Hsu, Q Kang, M Liu… - Proceedings of the …, 2021 - dl.acm.org
The gap between CPU and networking speeds has motivated the development of
SmartNICs for NF (network functions) offloading. However, offloading performance is …

Scalable kernel fusion for memory-bound GPU applications

M Wahib, N Maruyama - SC'14: Proceedings of the …, 2014 - ieeexplore.ieee.org
GPU implementations of HPC applications relying on finite difference methods can include
tens of kernels that are memory-bound. Kernel fusion can improve performance by reducing …

Performance upper bound analysis and optimization of SGEMM on Fermi and Kepler GPUs

J Lai, A Seznec - Proceedings of the 2013 IEEE/ACM …, 2013 - ieeexplore.ieee.org
In this paper, we present an approach to estimate GPU applications' performance upper
bound based on algorithm analysis and assembly code level benchmarking. As an example …

Predicting gpu performance from cpu runs using machine learning

I Baldini, SJ Fink, E Altman - 2014 IEEE 26th International …, 2014 - ieeexplore.ieee.org
Graphics processing units (GPUs) can deliver considerable performance gains over general
purpose processors. However, GPU performance improvement vary considerably across …

Optimizing CUDA code by kernel fusion: application on BLAS

J Filipovič, M Madzin, J Fousek, L Matyska - The Journal of …, 2015 - Springer
Contemporary GPUs have significantly higher arithmetic throughput than a memory
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …

A survey of performance modeling and simulation techniques for accelerator-based computing

U Lopez-Novoa, A Mendiburu… - IEEE Transactions on …, 2014 - ieeexplore.ieee.org
The high performance computing landscape is shifting from collections of homogeneous
nodes towards heterogeneous systems, in which nodes consist of a combination of …

Palm: Easing the burden of analytical performance modeling

NR Tallent, A Hoisie - Proceedings of the 28th ACM international …, 2014 - dl.acm.org
Analytical (predictive) application performance models are critical for diagnosing
performance-limiting resources, optimizing systems, and designing machines. Creating …