PLASMA: Parallel linear algebra software for multicore using OpenMP
The recent version of the Parallel Linear Algebra Software for Multicore Architectures
(PLASMA) library is based on tasks with dependencies from the OpenMP standard. The …
(PLASMA) library is based on tasks with dependencies from the OpenMP standard. The …
Porting the PLASMA numerical library to the OpenMP standard
PLASMA is a numerical library intended as a successor to LAPACK for solving problems in
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …
dense linear algebra on multicore processors. PLASMA relies on the QUARK scheduler for …
Dynamic task execution on shared and distributed memory architectures
A YarKhan - 2012 - trace.tennessee.edu
Multicore architectures with high core counts have come to dominate the world of high
performance computing, from shared memory machines to the largest distributed memory …
performance computing, from shared memory machines to the largest distributed memory …
Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting
The LU factorization is an important numerical algorithm for solving systems of linear
equations in science and engineering and is a characteristic of many dense linear algebra …
equations in science and engineering and is a characteristic of many dense linear algebra …
Investigating applications portability with the uintah dag-based runtime system on petascale supercomputers
Q Meng, A Humphrey, J Schmidt… - Proceedings of the …, 2013 - dl.acm.org
Present trends in high performance computing present formidable challenges for
applications code using multicore nodes possibly with accelerators and/or co-processors …
applications code using multicore nodes possibly with accelerators and/or co-processors …
An improved parallel singular value algorithm and its implementation for multicore hardware
The enormous gap between the high-performance capabilities of today's CPUs and off-chip
communication poses extreme challenges to the development of numerical software that is …
communication poses extreme challenges to the development of numerical software that is …
Efficient block algorithms for parallel sparse triangular solve
The sparse triangular solve (SpTRSV) kernel is an important building block for a number of
linear algebra routines such as sparse direct and iterative solvers. The major challenge of …
linear algebra routines such as sparse direct and iterative solvers. The major challenge of …
LU factorization with partial pivoting for a multicore system with accelerators
LU factorization with partial pivoting is a canonical numerical procedure and the main
component of the high performance LINPACK benchmark. This paper presents an …
component of the high performance LINPACK benchmark. This paper presents an …
[PDF][PDF] Dense linear algebra on distributed heterogeneous hardware with a symbolic dag approach
G Bosilca - 2012 - escholarship.org
While the first two involve fundamental physical limitations that current technology trends are
unlikely to overcome in the near term, the third is an obvious consequence of the first two …
unlikely to overcome in the near term, the third is an obvious consequence of the first two …
High performance matrix inversion based on LU factorization for multicore architectures
The goal of this paper is to present an efficient implementation of an explicit matrix inversion
of general square matrices on multicore computer architecture. The inversion procedure is …
of general square matrices on multicore computer architecture. The inversion procedure is …