Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
AWB-GCN: A graph convolutional network accelerator with runtime workload rebalancing
Deep learning systems have been successfully applied to Euclidean data such as images,
video, and audio. In many applications, however, information and their relationships are …
video, and audio. In many applications, however, information and their relationships are …
CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication
W Liu, B Vinter - Proceedings of the 29th ACM on International …, 2015 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) is a fundamental building block for numerous
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
applications. In this paper, we propose CSR5 (Compressed Sparse Row 5), a new storage …
SpaceA: Sparse matrix vector multiplication on processing-in-memory accelerator
Sparse matrix-vector multiplication (SpMV) is an important primitive across a wide range of
application domains such as scientific computing and graph analytics. Due to its intrinsic …
application domains such as scientific computing and graph analytics. Due to its intrinsic …
Adaptive sparse tiling for sparse matrix multiplication
Tiling is a key technique for data locality optimization and is widely used in high-
performance implementations of dense matrix-matrix multiplication for multicore/manycore …
performance implementations of dense matrix-matrix multiplication for multicore/manycore …
GPGPU performance and power estimation using machine learning
G Wu, JL Greathouse, A Lyashevsky… - 2015 IEEE 21st …, 2015 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) have numerous configuration and design options,
including core frequency, number of parallel compute units (CUs), and available memory …
including core frequency, number of parallel compute units (CUs), and available memory …
Smash: Co-designing software compression and hardware-accelerated indexing for efficient sparse matrix operations
Important workloads, such as machine learning and graph analytics applications, heavily
involve sparse linear algebra operations. These operations use sparse matrix compression …
involve sparse linear algebra operations. These operations use sparse matrix compression …
Sparse matrix-vector multiplication on GPGPUs
The multiplication of a sparse matrix by a dense vector (SpMV) is a centerpiece of scientific
computing applications: it is the essential kernel for the solution of sparse linear systems and …
computing applications: it is the essential kernel for the solution of sparse linear systems and …
Sparse-TPU: Adapting systolic arrays for sparse matrices
While systolic arrays are widely used for dense-matrix operations, they are seldom used for
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …
sparse-matrix operations. In this paper, we show how a systolic array of Multiply-and …
Merge-based parallel sparse matrix-vector multiplication
D Merrill, M Garland - SC'16: Proceedings of the International …, 2016 - ieeexplore.ieee.org
We present a strictly balanced method for the parallel computation of sparse matrix-vector
products (SpMV). Our algorithm operates directly upon the Compressed Sparse Row (CSR) …
products (SpMV). Our algorithm operates directly upon the Compressed Sparse Row (CSR) …