TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

Y Niu, Z Lu, H Ji, S Song, Z **, W Liu - Proceedings of the 27th ACM …, 2022 - dl.acm.org
Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …

Haspgemm: Heterogeneity-aware sparse general matrix-matrix multiplication on modern asymmetric multicore processors

H Cheng, W Li, Y Lu, W Liu - … of the 52nd International Conference on …, 2023 - dl.acm.org
Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in
computational science and engineering, and has been widely studied on homogeneous …

Tilespmspv: A tiled algorithm for sparse matrix-sparse vector multiplication on gpus

H Ji, H Song, S Lu, Z **, G Tan, W Liu - Proceedings of the 51st …, 2022 - dl.acm.org
Sparse matrix-sparse vector multiplication (SpMSpV) is an important primitive for graph
algorithms and machine learning applications. The sparsity of the input and output vectors …

HAM-SpMSpV: an Optimized Parallel Algorithm for Masked Sparse Matrix-Sparse Vector Multiplications on multi-core CPUs

L Xu, H Jia, Y Zhang, L Wang, X Jiang - Proceedings of the 33rd …, 2024 - dl.acm.org
The efficiency of Sparse Matrix-Sparse Vector Multiplication (SpM-SpV) is critically important
in fields such as machine learning and graph analytics. In certain algorithms, masked …

[KÖNYV][B] Extending Vector Processing Units for Enhanced Linear Algebra Performance

MV Maceiras - 2024 - search.proquest.com
Abstract Vector Processing Units (VPUs) have made a comeback to the landscape of
computer architecture as a response to the diminishing returns from technology scaling and …

DeltaSPARSE: High-Performance Sparse General Matrix-Matrix Multiplication on Multi-GPU Systems

S Yang, C Zhang, J Ma - 2023 IEEE 30th International …, 2023 - ieeexplore.ieee.org
Sparse General Matrix-Matrix Multiplication (SpGEMM) serves as a fundamental operation
in the domains of sparse linear algebra and graph data processing. The majority of existing …

并行规约与扫描原语在 ReRAM 架构上的性能优化.

金洲, 段懿洳, 伊恩鑫, 戢昊男… - Journal of National …, 2022 - search.ebscohost.com
规约与扫描是并行计算中的核心原语, 其并行加速至关重要. 然而, 冯· 诺依曼体系结构下无法
避免的数据移动使其面临“存储墙” 等性能与功耗瓶颈. **来, 基于ReRAM …