The Impact of Space-Filling Curves on Data Movement in Parallel Systems
Modern computer systems are characterized by deep memory hierarchies, composed of
main memory, multiple layers of cache, and other specialized types of memory. In parallel …
main memory, multiple layers of cache, and other specialized types of memory. In parallel …
Distributed and heterogeneous tensor–vector contraction algorithms for high performance computing
The tensor–vector contraction (TVC) is the most memory-bound operation of its class and a
core component of the higher-order power method (HOPM). This paper brings distributed …
core component of the higher-order power method (HOPM). This paper brings distributed …
A heterogeneous parallel computing approach optimizing SpTTM on CPU-GPU via GCN
Sparse Tensor-Times-Matrix (SpTTM) is the core calculation in tensor analysis. The sparse
distributions of different tensors vary greatly, which poses a big challenge to designing …
distributions of different tensors vary greatly, which poses a big challenge to designing …
Using Evolutionary Algorithms to Find Cache-Friendly Generalized Morton Layouts for Arrays
The layout of multi-dimensional data can have a significant impact on the efficacy of
hardware caches and, by extension, the performance of applications. Common multi …
hardware caches and, by extension, the performance of applications. Common multi …
Finding Morton-Like Layouts for Multi-Dimensional Arrays Using Evolutionary Algorithms
The layout of multi-dimensional data can have a significant impact on the efficacy of
hardware caches and, by extension, the performance of applications. Common multi …
hardware caches and, by extension, the performance of applications. Common multi …
Improved Data Locality Using Morton-order Curve on the Example of LU Decomposition
The LU decomposition is an essential element used in many linear algebra applications.
Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core …
Furthermore, it is used in LINPACK to benchmark the performance of modern multi-core …
Fast and Layout-Oblivious Tensor-Matrix Multiplication with BLAS
CS Başsoy - International Conference on Computational Science, 2024 - Springer
The tensor-matrix multiplication is a basic tensor operation required by various tensor
methods such as the ALS and the HOSVD. This paper presents flexible high-performance …
methods such as the ALS and the HOSVD. This paper presents flexible high-performance …
High performance tensor–vector multiplication on shared-memory systems
Tensor–vector multiplication is one of the core components in tensor computations. We have
recently investigated high performance, single core implementation of this bandwidth-bound …
recently investigated high performance, single core implementation of this bandwidth-bound …
A native tensor–vector multiplication algorithm for high performance computing
PJ Martinez-Ferrer, AN Yzelman… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Tensor computations are important mathematical operations for applications that rely on
multidimensional data. The tensor–vector multiplication (TVM) is the most memory-bound …
multidimensional data. The tensor–vector multiplication (TVM) is the most memory-bound …
High performance tensor-vector multiplies on shared memory systems
Tensor–vector multiplication is one of the core components in tensor computations. We have
recently investigated high performance, single core implementation of this bandwidth-bound …
recently investigated high performance, single core implementation of this bandwidth-bound …