Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect

A Li, SL Song, J Chen, J Li, X Liu… - … on Parallel and …, 2019 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

Pangulu: A scalable regular two-dimensional block-cyclic sparse direct solver on distributed heterogeneous systems

X Fu, B Zhang, T Wang, W Li, Y Lu, E Yi… - Proceedings of the …, 2023 - dl.acm.org
Sparse direct solvers play a vital role in large-scale high performance computing in science
and engineering. Existing distributed sparse direct methods employ multifrontal/supernodal …

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters

A Li, O Subasi, X Yang… - … conference for high …, 2020 - ieeexplore.ieee.org
As quantum computers evolve, simulations of quantum programs on classical computers will
be essential in validating quantum algorithms, understanding the effect of system noise, and …

Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite

A Li, SL Song, J Chen, X Liu, N Tallent… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

Porting hypre to heterogeneous computer architectures: Strategies and experiences

RD Falgout, R Li, B Sjögreen, L Wang, UM Yang - Parallel Computing, 2021 - Elsevier
Linear systems are occurring in many applications, and solving them can take a large
amount of the total simulation time. The high performance library hypre provides a variety of …

swSpTRSV: A fast sparse triangular solve with sparse level tile layout on sunway architectures

X Wang, W Liu, W Xue, L Wu - Proceedings of the 23rd ACM SIGPLAN …, 2018 - dl.acm.org
Sparse triangular solve (SpTRSV) is one of the most important kernels in many real-world
applications. Currently, much research on parallel SpTRSV focuses on level-set construction …

Fast segmented sort on gpus

K Hou, W Liu, H Wang, W Feng - Proceedings of the International …, 2017 - dl.acm.org
Segmented sort, as a generalization of classical sort, orders a batch of independent
segments in a whole array. Along with the wider adoption of manycore processors for HPC …

GPU-resident sparse direct linear solvers for alternating current optimal power flow analysis

K Świrydowicz, N Koukpaizan, T Ribizel… - International Journal of …, 2024 - Elsevier
Integrating renewable resources within the transmission grid at a wide scale poses
significant challenges for economic dispatch as it requires analysis with more optimization …

Fast synchronization‐free algorithms for parallel sparse triangular solves with multiple right‐hand sides

W Liu, A Li, JD Hogg, IS Duff… - … and Computation: Practice …, 2017 - Wiley Online Library
The sparse triangular solve kernels, SpTRSV and SpTRSM, are important building blocks for
a number of numerical linear algebra routines. Parallelizing SpTRSV and SpTRSM on …

Sflu: Synchronization-free sparse lu factorization for fast circuit simulation on gpus

J Zhao, Y Wen, Y Luo, Z **, W Liu… - 2021 58th ACM/IEEE …, 2021 - ieeexplore.ieee.org
Sparse LU factorization is one of the key building blocks of sparse direct solvers and often
dominates the computing time of circuit simulation programs. Existing GPU-accelerated …