TileSpGEMM: A tiled algorithm for parallel sparse general matrix-matrix multiplication on GPUs

Y Niu, Z Lu, H Ji, S Song, Z **, W Liu - Proceedings of the 27th ACM …, 2022 - dl.acm.org
Sparse general matrix-matrix multiplication (SpGEMM) is one of the most fundamental
building blocks in sparse linear solvers, graph processing frameworks and machine learning …

Haspgemm: Heterogeneity-aware sparse general matrix-matrix multiplication on modern asymmetric multicore processors

H Cheng, W Li, Y Lu, W Liu - … of the 52nd International Conference on …, 2023 - dl.acm.org
Sparse general matrix-matrix multiplication (SpGEMM) is an important kernel in
computational science and engineering, and has been widely studied on homogeneous …

Amgt: Algebraic multigrid solver on tensor cores

Y Lu, L Zeng, T Wang, X Fu, W Li… - … Conference for High …, 2024 - ieeexplore.ieee.org
Algebraic multigrid (AMG) methods are particularly efficient to solve a wide range of sparse
linear systems, due to their good flexibility and adaptability. Even though modern parallel …

MPI+ ULT: Overlap** communication and computation with user-level threads

H Lu, S Seo, P Balaji - … on Cyberspace Safety and Security, and …, 2015 - ieeexplore.ieee.org
As the core density of future processors keeps increasing, MPI+ Threads is becoming a
promising programming model for large scale SMP clusters. Generally speaking, hybrid …

Algebraic multigrid domain and range decomposition (AMG-DD/AMG-RD)

R Bank, R Falgout, T Jones, TA Manteuffel… - SIAM Journal on …, 2015 - SIAM
In modern large-scale supercomputing applications, algebraic multigrid (AMG) is a leading
choice for solving matrix equations. However, the high cost of communication relative to that …

Data-driven performance modeling of linear solvers for sparse matrices

JS Yeom, JJ Thiagarajan, A Bhatele… - … and Simulation of …, 2016 - ieeexplore.ieee.org
Performance of scientific codes is increasingly dependent on the input problem, its data
representation and the underlying hardware with the increase in code and architectural …

FP16 Acceleration in Structured Multigrid Preconditioner for Real-World Applications

Y Zong, P Yu, H Huang, W Xue - … of the 53rd International Conference on …, 2024 - dl.acm.org
Half-precision hardware support is now almost ubiquitous. In contrast to its active use in AI,
half-precision is less commonly employed in scientific and engineering computing. The …

End-to-end performance modeling of distributed GPU applications

J Choi, DF Richards, LV Kale, A Bhatele - Proceedings of the 34th ACM …, 2020 - dl.acm.org
With the growing number of GPU-based supercomputing platforms and GPU-enabled
applications, the ability to accurately model the performance of such applications is …

Improving performance of the hypre iterative solver for Uintah combustion codes on manycore architectures using MPI endpoints and kernel consolidation

D Sahasrabudhe, M Berzins - … , Amsterdam, The Netherlands, June 3–5 …, 2020 - Springer
The solution of large-scale combustion problems with codes such as the Arches component
of Uintah on next generation computer architectures requires the use of a many and multi …

Optimizing the hypre solver for manycore and GPU architectures

D Sahasrabudhe, R Zambre… - Journal of …, 2021 - Elsevier
The solution of large-scale combustion problems with codes such as Uintah on modern
computer architectures requires the use of multithreading and GPUs to achieve …