Matrix-Free High-Performance Saddle-Point Solvers for High-Order Problems in

W Pazner, T Kolev, PS Vassilevski - SIAM Journal on Scientific Computing, 2024 - SIAM
This work describes the development of matrix-free GPU-accelerated solvers for high-order
finite element problems in. The solvers are applicable to grad-div and Darcy problems in …

RETRACTED: Batched matrix computations on hardware accelerators based on GPUs

A Haidar, T Dong, P Luszczek… - … Journal of High …, 2015 - journals.sagepub.com
Scientific applications require solvers that work on many small size problems that are
independent from each other. At the same time, the high-end hardware evolves rapidly and …

A set of batched basic linear algebra subprograms and LAPACK routines

A Abdelfattah, T Costa, J Dongarra, M Gates… - ACM Transactions on …, 2021 - dl.acm.org
This article describes a standard API for a set of Batched Basic Linear Algebra Subprograms
(Batched BLAS or BBLAS). The focus is on many independent BLAS operations on small …

A framework for batched and GPU-resident factorization algorithms applied to block householder transformations

A Haidar, TT Dong, S Tomov, P Luszczek… - … Conference, ISC High …, 2015 - Springer
As modern hardware keeps evolving, an increasingly effective approach to develo**
energy efficient and high-performance solvers is to design them to work on many small size …

LU factorization of small matrices: Accelerating batched DGETRF on the GPU

T Dong, A Haidar, P Luszczek, JA Harris… - 2014 IEEE Intl Conf …, 2014 - ieeexplore.ieee.org
Gaussian Elimination is commonly used to solve dense linear systems in scientific models.
In a large number of applications, a need arises to solve many small size problems, instead …

Implementation and tuning of batched Cholesky factorization and solve for NVIDIA GPUs

J Kurzak, H Anzt, M Gates… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Many problems in engineering and scientific computing require the solution of a large
number of small systems of linear equations. Due to their high processing power, Graphics …

A guide for achieving high performance with very small matrices on GPU: a case study of batched LU and Cholesky factorizations

A Haidar, A Abdelfattah, M Zounon… - … on Parallel and …, 2017 - ieeexplore.ieee.org
We present a high-performance GPU kernel with a substantial speedup over vendor
libraries for very small matrix computations. In addition, we discuss most of the challenges …

Novel HPC techniques to batch execution of many variable size BLAS computations on GPUs

A Abdelfattah, A Haidar, S Tomov… - Proceedings of the …, 2017 - dl.acm.org
This paper presents a software framework for solving large numbers of relatively small
matrix problems using GPUs. Our approach combines novel and existing HPC techniques to …

Fast Cholesky factorization on GPUs for batch and native modes in MAGMA

A Abdelfattah, A Haidar, S Tomov… - Journal of Computational …, 2017 - Elsevier
This paper presents a GPU-accelerated Cholesky factorization for two different modes of
operation. The first one is the batch mode, where many independent factorizations on small …

Linear algebra software for large-scale accelerated multicore computing

A Abdelfattah, H Anzt, J Dongarra, M Gates… - Acta Numerica, 2016 - cambridge.org
Many crucial scientific computing applications, ranging from national security to medical
advances, rely on high-performance linear algebra algorithms and technologies …