Communication lower bounds and optimal algorithms for numerical linear algebra

G Ballard, E Carson, J Demmel, M Hoemmen… - Acta Numerica, 2014 - cambridge.org
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …

The sparse polyhedral framework: Composing compiler-generated inspector-executor code

MM Strout, M Hall, C Olschanowsky - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …

Communication-optimal parallel and sequential QR and LU factorizations

J Demmel, L Grigori, M Hoemmen, J Langou - SIAM Journal on Scientific …, 2012 - SIAM
We present parallel and sequential dense QR factorization algorithms that are both optimal
(up to polylogarithmic factors) in the amount of communication they perform and just as …

Reducing communication in graph neural network training

A Tripathy, K Yelick, A Buluç - SC20: International Conference …, 2020 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) are powerful and flexible neural networks that use the
naturally sparse connectivity information of the data. GNNs represent this connectivity as …

[BOG][B] Communication-avoiding Krylov subspace methods

M Hoemmen - 2010 - search.proquest.com
Krylov subspace methods (KSMs) are iterative algorithms for solving large, sparse linear
systems and eigenvalue problems. Current KSMs rely on sparse matrix-vector multiply …

Tiled QR factorization algorithms

H Bouwmeester, M Jacquelin, J Langou… - Proceedings of 2011 …, 2011 - dl.acm.org
This work revisits existing algorithms for the QR factorization of rectangular matrices
composed of p× q tiles, where p≥ q. Within this framework, we study the critical paths and …

[BOG][B] Communication-avoiding Krylov subspace methods in theory and practice

EC Carson - 2015 - search.proquest.com
Advancements in the field of high-performance scientific computing are necessary to
address the most important challenges we face in the 21st century. From physical modeling …

Communication-avoiding QR decomposition for GPUs

M Anderson, G Ballard, J Demmel… - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
We describe an implementation of the Communication-Avoiding QR (CAQR) factorization
that runs entirely on a single graphics processor (GPU). We show that the reduction in …

Parallel algorithms for tensor train arithmetic

HA Daas, G Ballard, P Benner - SIAM Journal on Scientific Computing, 2022 - SIAM
We present efficient and scalable parallel algorithms for performing mathematical operations
for low-rank tensors represented in the tensor train (TT) format. We consider algorithms for …

Shifted Cholesky QR for computing the QR factorization of ill-conditioned matrices

T Fukaya, R Kannan, Y Nakatsukasa… - SIAM Journal on …, 2020 - SIAM
The Cholesky QR algorithm is an efficient communication-minimizing algorithm for
computing the QR factorization of a tall-skinny matrix X∈R^m*n, where m≫n. Unfortunately …