A framework for general sparse matrix–matrix multiplication on GPUs and heterogeneous processors
W Liu, B Vinter - Journal of Parallel and Distributed Computing, 2015 - Elsevier
General sparse matrix–matrix multiplication (SpGEMM) is a fundamental building block for
numerous applications such as algebraic multigrid method (AMG), breadth first search and …
numerous applications such as algebraic multigrid method (AMG), breadth first search and …
Hypergraph partitioning for sparse matrix-matrix multiplication
We propose a fine-grained hypergraph model for sparse matrix-matrix multiplication
(SpGEMM), a key computational kernel in scientific computing and data analysis whose …
(SpGEMM), a key computational kernel in scientific computing and data analysis whose …
The input/output complexity of triangle enumeration
We consider the well-known problem of enumerating all triangles of an undirected graph.
Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the …
Our focus is on determining the input/output (I/O) complexity of this problem. Let E be the …
Fast Matrix Multiplication for Query Processing
X Hu - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
This paper studies how to use fast matrix multiplication to speed up query processing. As
observed, computing a two-table join and then projecting away the join attribute is …
observed, computing a two-table join and then projecting away the join attribute is …
Hypergraph Partitioning for Parallel Sparse Matrix-Matrix Multiplication
The performance of parallel algorithms for sparse matrix-matrix multiplication is typically
determined by the amount of interprocessor communication performed, which in turn …
determined by the amount of interprocessor communication performed, which in turn …
[PDF][PDF] Parallel and scalable sparse basic linear algebra subprograms
W Liu - 2015 - nbi.ku.dk
Sparse basic linear algebra subprograms (BLAS) are fundamental building blocks for
numerous scientific computations and graph applications. Compared with Dense BLAS …
numerous scientific computations and graph applications. Compared with Dense BLAS …
Optimization of Sparse Matrix Computation for Algebraic Multigrid on GPUs
AMG is one of the most efficient and widely used methods for solving sparse linear systems.
The computational process of AMG mainly consists of a series of iterative calculations of …
The computational process of AMG mainly consists of a series of iterative calculations of …
The I/O complexity of Strassen's matrix multiplication with recomputation
Abstract A tight\varOmega ((n/M)^\log _2 7 M) lower bound is derived on the I/O complexity
of Strassen's algorithm to multiply two n * n matrices, in a two-level storage hierarchy with M …
of Strassen's algorithm to multiply two n * n matrices, in a two-level storage hierarchy with M …
Fine-grained I/O complexity via reductions: New lower bounds, faster algorithms, and a time hierarchy
This paper initiates the study of I/O algorithms (minimizing cache misses) from the
perspective of fine-grained complexity (conditional polynomial lower bounds). Specifically …
perspective of fine-grained complexity (conditional polynomial lower bounds). Specifically …
The I/O Complexity of Attention, or How Optimal is Flash Attention?
Self-attention is at the heart of the popular Transformer architecture, yet suffers from
quadratic time and memory complexity. The breakthrough FlashAttention algorithm revealed …
quadratic time and memory complexity. The breakthrough FlashAttention algorithm revealed …