Communication-optimal parallel algorithm for strassen's matrix multiplication
Parallel matrix multiplication is one of the most studied fundamental problems in distributed
and high performance computing. We obtain a new parallel algorithm that is based on …
and high performance computing. We obtain a new parallel algorithm that is based on …
Graph expansion and communication costs of fast matrix multiplication
The communication cost of algorithms (also known as I/O-complexity) is shown to be closely
related to the expansion properties of the corresponding computation graphs. We …
related to the expansion properties of the corresponding computation graphs. We …
Pebbling Game and Alternative Basis for High Performance Matrix Multiplication
O Schwartz, N Vaknin - SIAM Journal on Scientific Computing, 2023 - SIAM
Matrix multiplication is one of the most extensively used kernels in scientific computing.
Although subcubic algorithms exist, most high performance implementations are based on …
Although subcubic algorithms exist, most high performance implementations are based on …
Matrix multiplication, a little faster
E Karstadt, O Schwartz - Journal of the ACM (JACM), 2020 - dl.acm.org
Strassen's algorithm (1969) was the first sub-cubic matrix multiplication algorithm. Winograd
(1971) improved the leading coefficient of its complexity from 6 to 7. There have been many …
(1971) improved the leading coefficient of its complexity from 6 to 7. There have been many …
Multifrontal methods: parallelism, memory usage and numerical aspects
JY L'Excellent - 2012 - theses.hal.science
Direct methods for the solution of sparse systems of linear equations are used in a wide
range of numerical simulation applications. Such methods are based on the decomposition …
range of numerical simulation applications. Such methods are based on the decomposition …
A Matrix–Matrix Multiplication methodology for single/multi-core architectures using SIMD
V Kelefouras, A Kritikakou, C Goutis - The Journal of Supercomputing, 2014 - Springer
In this paper, a new methodology for speeding up Matrix–Matrix Multiplication using Single
Instruction Multiple Data unit, at one and more cores having a shared cache, is presented …
Instruction Multiple Data unit, at one and more cores having a shared cache, is presented …
A high-performance matrix–matrix multiplication methodology for CPU and GPU architectures
Current compilers cannot generate code that can compete with hand-tuned code in
efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in …
efficiency, even for a simple kernel like matrix–matrix multiplication (MMM). A key step in …
Efficiently parallelizable strassen-based multiplication of a matrix by its transpose
The multiplication of a matrix by its transpose, ATA, appears as an intermediate operation in
the solution of a wide set of problems. In this paper, we propose a new cache-oblivious …
the solution of a wide set of problems. In this paper, we propose a new cache-oblivious …
Stark: Fast and scalable Strassen's matrix multiplication using Apache Spark
C Misra, S Bhattacharya… - IEEE Transactions on Big …, 2020 - ieeexplore.ieee.org
This article presents a new fast, highly scalable distributed matrix multiplication algorithm on
Apache Spark, called Stark, based on Strassen's matrix multiplication algorithm. Stark …
Apache Spark, called Stark, based on Strassen's matrix multiplication algorithm. Stark …
Alternative Basis Matrix Multiplication is fast and stable
Alternative basis matrix multiplication algorithms are the fastest matrix multiplication
algorithms in practice to date. However, are they numerically stable? We obtain the first …
algorithms in practice to date. However, are they numerically stable? We obtain the first …