[BUCH][B] Structured parallel programming: patterns for efficient computation

M McCool, J Reinders, A Robison - 2012 - books.google.com
Structured Parallel Programming offers the simplest way for developers to learn patterns for
high-performance parallel programming. Written by parallel computing experts and industry …

Towards dense linear algebra for hybrid GPU accelerated manycore systems

S Tomov, J Dongarra, M Baboulin - Parallel Computing, 2010 - Elsevier
We highlight the trends leading to the increased appeal of using hybrid multicore+ GPU
systems for high performance computing. We present a set of techniques that can be used to …

Communication-optimal parallel and sequential QR and LU factorizations

J Demmel, L Grigori, M Hoemmen, J Langou - SIAM Journal on Scientific …, 2012 - SIAM
We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform and just as …

Communication-optimal parallel 2.5 D matrix multiplication and LU factorization algorithms

E Solomonik, J Demmel - European Conference on Parallel Processing, 2011 - Springer
Extra memory allows parallel matrix multiplication to be done with asymptotically less
communication than Cannon's algorithm and be faster in practice.“3D” algorithms arrange …

[BUCH][B] Communication-avoiding Krylov subspace methods

M Hoemmen - 2010 - search.proquest.com
Krylov subspace methods (KSMs) are iterative algorithms for solving large, sparse linear
systems and eigenvalue problems. Current KSMs rely on sparse matrix-vector multiply …

Gaussian elimination

NJ Higham - Wiley Interdisciplinary Reviews: Computational …, 2011 - Wiley Online Library
As the standard method for solving systems of linear equations, Gaussian elimination (GE) is
one of the most important and ubiquitous numerical algorithms. However, its successful use …

A survey of recent developments in parallel implementations of Gaussian elimination

S Donfack, J Dongarra, M Faverge… - Concurrency and …, 2015 - Wiley Online Library
Gaussian elimination is a canonical linear algebra procedure for solving linear systems of
equations. In the last few years, the algorithm has received a lot of attention in an attempt to …

Graph expansion and communication costs of fast matrix multiplication

G Ballard, J Demmel, O Holtz, O Schwartz - Journal of the ACM (JACM), 2013 - dl.acm.org
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely
related to the expansion properties of the corresponding computation graphs. We …

Communication avoiding rank revealing QR factorization with column pivoting

JW Demmel, L Grigori, M Gu, H **ang - SIAM Journal on Matrix Analysis and …, 2015 - SIAM
In this paper we introduce CARRQR, a communication avoiding rank revealing QR
factorization with tournament pivoting. We show that CARRQR reveals the numerical rank of …

Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems

F Song, S Tomov, J Dongarra - … of the 26th ACM international conference …, 2012 - dl.acm.org
We present a new approach to utilizing all CPU cores and all GPUs on heterogeneous
multicore and multi-GPU systems to support dense matrix computations efficiently. The main …