Preparing sparse solvers for exascale computing

H Anzt, E Boman, R Falgout… - … of the Royal …, 2020 - royalsocietypublishing.org
Sparse solvers provide essential functionality for a wide variety of scientific applications.
Highly parallel sparse solvers are essential for continuing advances in high-fidelity, multi …

Pangulu: A scalable regular two-dimensional block-cyclic sparse direct solver on distributed heterogeneous systems

X Fu, B Zhang, T Wang, W Li, Y Lu, E Yi… - Proceedings of the …, 2023 - dl.acm.org
Sparse direct solvers play a vital role in large-scale high performance computing in science
and engineering. Existing distributed sparse direct methods employ multifrontal/supernodal …

Unified Communication Optimization Strategies for Sparse Triangular Solver on CPU and GPU Clusters

Y Liu, N Ding, P Sao, S Williams, XS Li - Proceedings of the International …, 2023 - dl.acm.org
This paper presents a unified communication optimization framework for sparse triangular
solve (SpTRSV) algorithms on CPU and GPU clusters. The framework builds upon a 3D …

Harnessing the crowd for autotuning high-performance computing applications

Y Cho, JW Demmel, J King, XS Li… - 2023 IEEE …, 2023 - ieeexplore.ieee.org
This paper presents GPTuneCrowd, a crowd-based autotuning framework for tuning high-
performance computing applications. GPTuneCrowd collects performance data from various …

A supernodal all-pairs shortest path algorithm

P Sao, R Kannan, P Gera, R Vuduc - Proceedings of the 25th ACM …, 2020 - dl.acm.org
We show how to exploit graph sparsity in the Floyd-Warshall algorithm for the all-pairs
shortest path (Apsp) problem. Floyd-Warshall is an attractive choice for Apsp on high …

GraphFly: Efficient asynchronous streaming graphs processing via dependency-flow

D Chen, C Gui, Y Zhang, H **, L Zheng… - … Conference for High …, 2022 - ieeexplore.ieee.org
Existing streaming graph processing systems typically adopt two phases of refinement and
recomputation to ensure the correctness of the incremental computation. However, severe …

Accelerating Large-Scale Sparse LU Factorization for RF Circuit Simulation

G Feng, H Wang, Z Guo, M Li, T Zhao, Z **… - … Conference on Parallel …, 2024 - Springer
Sparse LU factorization is the indispensable building block of the circuit simulation, and
dominates the simulation time, especially when dealing with large-scale circuits. Radio …

A distributed-memory algorithm for computing a heavy-weight perfect matching on bipartite graphs

A Azad, A Buluç, XS Li, X Wang, J Langguth - SIAM Journal on Scientific …, 2020 - SIAM
We design and implement an efficient parallel algorithm for finding a perfect matching in a
weighted bipartite graph such that weights on the edges of the matching are large. This …

swSuperLU: A highly scalable sparse direct solver on Sunway manycore architecture

M Tian, J Wang, Z Zhang, W Du, J Pan, T Liu - The Journal of …, 2022 - Springer
Sparse LU factorization is essential for scientific and engineering simulations. In this work,
we present swSuperLU, a highly scalable sparse direct solver on Sunway manycore …

A communication-avoiding 3D algorithm for sparse LU factorization on heterogeneous systems

P Sao, XS Li, R Vuduc - Journal of Parallel and Distributed Computing, 2019 - Elsevier
We propose a new algorithm to improve the strong scalability of right-looking sparse LU
factorization on distributed memory systems. Our 3D algorithm for sparse LU uses a three …