The singular value decomposition: Anatomy of optimizing an algorithm for extreme scale

J Dongarra, M Gates, A Haidar, J Kurzak, P Luszczek… - SIAM review, 2018 - SIAM
The computation of the singular value decomposition, or SVD, has a long history with many
improvements over the years, both in its implementations and algorithmically. Here, we …

Accelerating numerical dense linear algebra calculations with GPUs

J Dongarra, M Gates, A Haidar, J Kurzak… - … computations with GPUs, 2014 - Springer
This chapter presents the current best design and implementation practices for the
acceleration of dense linear algebra (DLA) on GPUs. Examples are given with fundamental …

A survey of recent developments in parallel implementations of Gaussian elimination

S Donfack, J Dongarra, M Faverge… - Concurrency and …, 2015 - Wiley Online Library
Gaussian elimination is a canonical linear algebra procedure for solving linear systems of
equations. In the last few years, the algorithm has received a lot of attention in an attempt to …

Polynomial chaos expansion of random coefficients and the solution of stochastic partial differential equations in the tensor train format

S Dolgov, BN Khoromskij, A Litvinenko… - SIAM/ASA Journal on …, 2015 - SIAM
We apply the tensor train (TT) decomposition to construct the tensor product polynomial
chaos expansion (PCE) of a random field, to solve the stochastic elliptic diffusion PDE with …

Reasoning about functional programs and complexity classes associated with type disciplines

D Leivant - 24th Annual Symposium on Foundations of …, 1983 - ieeexplore.ieee.org
We present a method of reasoning directly about functional programs in Second-Order
Logic, based on the use of explicit second-order definitions for inductively defined data …

Optimizations of the eigensolvers in the ELPA library

P Kůs, A Marek, SS Köcher, HH Kowalski… - Parallel Computing, 2019 - Elsevier
The solution of (generalized) eigenvalue problems for symmetric or Hermitian matrices is a
common subtask of many numerical calculations in electronic structure theory or materials …

High-performance sampling of generic determinantal point processes

J Poulson - … Transactions of the Royal Society A, 2020 - royalsocietypublishing.org
Determinantal point processes (DPPs) were introduced by Macchi (Macchi 1975 Adv. Appl.
Probab. 7, 83–122) as a model for repulsive (fermionic) particle distributions. But their recent …

High-performance SVD partial spectrum computation

D Keyes, H Ltaief, Y Nakatsukasa… - Proceedings of the …, 2023 - dl.acm.org
We introduce a new singular value decomposition (SVD) solver based on the QR-based
Dynamically Weighted Halley (QDWH) algorithm for computing the partial spectrum SVD …

Openmp target task: Tasking and target offloading on heterogeneous systems

P Valero-Lara, J Kim, O Hernandez, J Vetter - European Conference on …, 2021 - Springer
This work evaluated the use of OpenMP tasking with target GPU offloading as a potential
solution for programming productivity and performance on heterogeneous systems. Also, it …

Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting

J Dongarra, M Faverge, H Ltaief… - Concurrency and …, 2014 - Wiley Online Library
The LU factorization is an important numerical algorithm for solving systems of linear
equations in science and engineering and is a characteristic of many dense linear algebra …