Acceleration of tensor-product operations for high-order finite element methods
This article is devoted to graphics processing unit (GPU) kernel optimization and
performance analysis of three tensor-product operations arising in finite element methods …
performance analysis of three tensor-product operations arising in finite element methods …
Efficient parallel implementations of sparse triangular solves for GPU architectures
The sparse triangular matrix solve (SpTrSV) is an important computation kernel that is
demanded by a variety of numerical methods such as the Gauss-Seidel iterations. However …
demanded by a variety of numerical methods such as the Gauss-Seidel iterations. However …
CSR2: a new format for SIMD-accelerated SpMV
SpMV (Sparse matrix-vector multiplication) has attracted the attention of researchers in
related fields at home and abroad. Of course, improving SpMV performance has also been a …
related fields at home and abroad. Of course, improving SpMV performance has also been a …
A simple and efficient storage format for SIMD-accelerated SpMV
SpMV (Sparse matrix-vector multiplication) is an essential component in scientific computing
and has attracted the attention of researchers in related fields at home and abroad. With the …
and has attracted the attention of researchers in related fields at home and abroad. With the …
A two-level GPU-accelerated incomplete LU preconditioner for general sparse linear systems
This paper presents a parallel preconditioning approach based on incomplete LU (ILU)
factorizations in the framework of Domain Decomposition (DD) for general sparse linear …
factorizations in the framework of Domain Decomposition (DD) for general sparse linear …
Modal FEM analysis of ferrite resonant structures
The finite-element method (FEM) is applied for modal analysis of ferrite-loaded spherical
resonators. To improve the efficiency of the numerical calculations, the body-of-revolution …
resonators. To improve the efficiency of the numerical calculations, the body-of-revolution …
On parallel solution of sparse triangular linear systems in CUDA
R Li - arxiv preprint arxiv:1710.04985, 2017 - arxiv.org
The acceleration of sparse matrix computations on modern many-core processors, such as
the graphics processing units (GPUs), has been recognized and studied over a decade …
the graphics processing units (GPUs), has been recognized and studied over a decade …
Evaluation of directive-based GPU programming models on a block Eigensolver with consideration of large sparse matrices
Achieving high performance and performance portability for large-scale scientific
applications is a major challenge on heterogeneous computing systems such as many-core …
applications is a major challenge on heterogeneous computing systems such as many-core …
CAMLB-SpMV: An Efficient Cache-Aware Memory Load-Balancing SpMV on CPU
J Guo, R **a, J Liu, X Zhu, X Zhang - Proceedings of the 53rd …, 2024 - dl.acm.org
Sparse Matrix-Vector Multiplication (SpMV) plays a crucial role in scientific computing, but
severe load imbalance among threads restricts its performance. Previous load-balancing …
severe load imbalance among threads restricts its performance. Previous load-balancing …
Block conjugate-gradient method with multilevel preconditioning and GPU acceleration for FEM problems in electromagnetics
A Dziekonski, M Mrozowski - IEEE Antennas and Wireless …, 2018 - ieeexplore.ieee.org
In this letter, a graphics processing unit (GPU)-accelerated block conjugate-gradient solver
with multilevel preconditioning is presented for solving a large system of sparse equations …
with multilevel preconditioning is presented for solving a large system of sparse equations …