Communication lower bounds and optimal algorithms for numerical linear algebra
The traditional metric for the efficiency of a numerical algorithm has been the number of
arithmetic operations it performs. Technological trends have long been reducing the time to …
arithmetic operations it performs. Technological trends have long been reducing the time to …
Faster all-pairs shortest paths via circuit complexity
R Williams - Proceedings of the forty-sixth annual ACM symposium …, 2014 - dl.acm.org
We present a new randomized method for computing the min-plus product (aka, tropical
product) of two n× n matrices, yielding a faster algorithm for solving the all-pairs shortest …
product) of two n× n matrices, yielding a faster algorithm for solving the all-pairs shortest …
AUGEM: automatically generate high performance dense linear algebra kernels on x86 CPUs
Basic Liner algebra subprograms (BLAS) is a fundamental library in scientific computing. In
this paper, we present a template-based optimization framework, AUGEM, which can …
this paper, we present a template-based optimization framework, AUGEM, which can …
Algebraic methods in the congested clique
In this work, we use algebraic methods for studying distance computation and subgraph
detection tasks in the congested clique model. Specifically, we adapt parallel matrix …
detection tasks in the congested clique model. Specifically, we adapt parallel matrix …
Communication-optimal parallel recursive rectangular matrix multiplication
Communication-optimal algorithms are known for square matrix multiplication. Here, we
obtain the first communication-optimal algorithm for all dimensions of rectangular matrices …
obtain the first communication-optimal algorithm for all dimensions of rectangular matrices …
A framework for practical parallel fast matrix multiplication
Matrix multiplication is a fundamental computation in many scientific disciplines. In this
paper, we show that novel fast matrix multiplication algorithms can significantly outperform …
paper, we show that novel fast matrix multiplication algorithms can significantly outperform …
Graph expansion and communication costs of fast matrix multiplication
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely
related to the expansion properties of the corresponding computation graphs. We …
related to the expansion properties of the corresponding computation graphs. We …
Communication optimal parallel multiplication of sparse random matrices
Parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time
on inter-processor communication rather than on computation, and hardware trends predict …
on inter-processor communication rather than on computation, and hardware trends predict …
Pebbling Game and Alternative Basis for High Performance Matrix Multiplication
Matrix multiplication is one of the most extensively used kernels in scientific computing.
Although subcubic algorithms exist, most high performance implementations are based on …
Although subcubic algorithms exist, most high performance implementations are based on …
Scalable graph convolutional network training on distributed-memory systems
Graph Convolutional Networks (GCNs) are extensively utilized for deep learning on graphs.
The large data sizes of graphs and their vertex features make scalable training algorithms …
The large data sizes of graphs and their vertex features make scalable training algorithms …