Acceleration of tensor-product operations for high-order finite element methods

K Świrydowicz, N Chalmers… - … Journal of High …, 2019 - journals.sagepub.com
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …

Efficient parallel implementations of sparse triangular solves for GPU architectures

R Li, C Zhang - Proceedings of the 2020 SIAM Conference on Parallel …, 2020 - SIAM
The sparse triangular matrix solve (SpTrSV) is an important computation kernel that is
demanded by a variety of numerical methods such as the Gauss-Seidel iterations. However …

CSR2: a new format for SIMD-accelerated SpMV

H Bian, J Huang, R Dong, L Liu… - 2020 20th IEEE/ACM …, 2020 - ieeexplore.ieee.org
SpMV (Sparse matrix-vector multiplication) has attracted the attention of researchers in
related fields at home and abroad. Of course, improving SpMV performance has also been a …

A simple and efficient storage format for SIMD-accelerated SpMV

H Bian, J Huang, R Dong, Y Guo, L Liu, D Huang… - Cluster …, 2021 - Springer
SpMV (Sparse matrix-vector multiplication) is an essential component in scientific computing
and has attracted the attention of researchers in related fields at home and abroad. With the …

A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems

T Xu, R Li, D Osei-Kuffuor - arxiv preprint arxiv:2303.08881, 2023 - arxiv.org
This paper presents a parallel preconditioning approach based on incomplete LU (ILU)
factorizations in the framework of Domain Decomposition (DD) for general sparse linear …

Modal FEM analysis of ferrite resonant structures

M Warecka, G Fotyga, P Kowalczyk… - IEEE Microwave and …, 2022 - ieeexplore.ieee.org
The finite-element method (FEM) is applied for modal analysis of ferrite-loaded spherical
resonators. To improve the efficiency of the numerical calculations, the body-of-revolution …

On parallel solution of sparse triangular linear systems in CUDA

R Li - arxiv preprint arxiv:1710.04985, 2017 - arxiv.org
The acceleration of sparse matrix computations on modern many-core processors, such as
the graphics processing units (GPUs), has been recognized and studied over a decade …

Evaluation of directive-based GPU programming models on a block Eigensolver with consideration of large sparse matrices

F Rabbi, CS Daley, HM Aktulga, NJ Wright - International Workshop on …, 2019 - Springer
Achieving high performance and performance portability for large-scale scientific
applications is a major challenge on heterogeneous computing systems such as many-core …

CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU

J Guo, R **a, J Liu, X Zhu, X Zhang - Proceedings of the 53rd …, 2024 - dl.acm.org
Sparse Matrix-Vector Multiplication (SpMV) plays a crucial role in scientific computing, but
severe load imbalance among threads restricts its performance. Previous load-balancing …

Block conjugate-gradient method with multilevel preconditioning and GPU acceleration for FEM problems in electromagnetics

A Dziekonski, M Mrozowski - IEEE Antennas and Wireless …, 2018 - ieeexplore.ieee.org
In this letter, a graphics processing unit (GPU)-accelerated block conjugate-gradient solver
with multilevel preconditioning is presented for solving a large system of sparse equations …