Optimization techniques for GPU programming

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing

T Geng, A Li, R Shi, C Wu, T Wang, Y Li… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …

CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication

W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …

SpaceA: Sparse matrix vector multiplication on processing-in-memory accelerator

X **e, Z Liang, P Gu, A Basak, L Deng… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Sparse matrix-vector multiplication (SpMV) is an important primitive across a wide range of
application domains such as scientific computing and graph analytics. Due to its intrinsic …

Adaptive sparse tiling for sparse matrix multiplication

C Hong, A Sukumaran-Rajam, I Nisa, K Singh… - Proceedings of the 24th …, 2019 - dl.acm.org
Tiling is a key technique for data locality optimization and is widely used in high-
performance implementations of dense matrix-matrix multiplication for multicore/manycore …

GPGPU performance and power estimation using machine learning

G Wu, JL Greathouse, A Lyashevsky… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have numerous configuration and design options,
including core frequency, number of parallel compute units (CUs), and available memory …

Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations

K Kanellopoulos, N Vijaykumar, C Giannoula… - Proceedings of the …, 2019 - dl.acm.org
Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …

Sparse matrix-vector multiplication on GPGPUs

S Filippone, V Cardellini, D Barbieri… - ACM Transactions on …, 2017 - dl.acm.org
The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific
computing applications: it is the essential kernel for the solution of sparse linear systems and …

Sparse-TPU: Adapting systolic arrays for sparse matrices

X He, S Pal, A Amarnath, S Feng, DH Park… - Proceedings of the 34th …, 2020 - dl.acm.org
While systolic arrays are widely used for dense-matrix operations, they are seldom used for
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …

Merge-based parallel sparse matrix-vector multiplication

D Merrill, M Garland - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
We present a strictly balanced method for the parallel computation of sparse matrix-vector
products (SpMV). Our algorithm operates directly upon the Compressed Sparse Row (CSR) …