Dasp: Specific dense matrix multiply-accumulate units accelerated general sparse matrix-vector multiplication

Y Lu, W Liu - Proceedings of the International Conference for High …, 2023 - dl.acm.org
Sparse matrix-vector multiplication (SpMV) plays a key role in computational science and
engineering, graph processing, and machine learning applications. Much work on SpMV …

Thermodynamic matrix exponentials and thermodynamic parallelism

S Duffield, M Aifer, G Crooks, T Ahle, PJ Coles - Physical Review Research, 2025 - APS
Thermodynamic computing exploits fluctuations and dissipation in physical systems to
efficiently solve various mathematical problems. It was recently shown that certain linear …

Accelerating ml workloads using gpu tensor cores: The good, the bad, and the ugly

B Hanindhito, LK John - Proceedings of the 15th ACM/SPEC …, 2024 - dl.acm.org
Machine Learning (ML) workloads generally contain a significant amount of matrix
computations; hence, hardware accelerators for ML have been incorporating support for …

A novel parallel algorithm for sparse tensor matrix chain multiplication via TCU-acceleration

H Wang, W Yang, R Hu, R Ouyang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Analysis of multi-dimensional data, especially tensor decomposition, which extracts latent
information, is becoming considerably popular. Although multi-dimensional sparse data is …

Similarity search with tensor core units

T D. Ahle, F Silvestri - International Conference on Similarity Search and …, 2020 - Springer
Abstract Tensor Core Units (TCUs) are hardware accelerators developed for deep neural
networks, which efficiently support the multiplication of two dense m * m matrices, where m is …

Blocking techniques for sparse matrix multiplication on tensor accelerators

PS Labini, M Bernaschi, F Silvestri, F Vella - arxiv preprint arxiv …, 2022 - arxiv.org
Tensor accelerators have gained popularity because they provide a cheap and efficient
solution for speeding up computational-expensive tasks in Deep Learning and, more …

Parallelizing filter-and-verification based exact set similarity joins on multicores

F Fier, JC Freytag - Information Systems, 2022 - Elsevier
Set similarity join (SSJ) is a well studied problem with many algorithms proposed to speed
up its performance. However, its scalability and performance are rarely discussed in modern …

Interpret: Inter-warp register reuse for gpu tensor core

JS Kwak, MK Yoon, I Jeong, S **… - 2023 32nd International …, 2023 - ieeexplore.ieee.org
Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior
computation throughput for general matrix-matrix multiplication (GEMM) that has been …

Accelerating finite impulse response filtering using tensor cores

T Kondo, Y Maeda, N Fukushima - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org
This paper studies how to accelerate a single channel 2D image convolution using NVIDIA's
Tensor Core. Tensor Core is a dedicated arithmetic unit for speeding up matrix products and …

A parallel scan algorithm in the tensor core unit model

A Zouzias, WF McColl - European Conference on Parallel Processing, 2023 - Springer
We present a parallel scan (prefix sum) algorithm in the Tensor Core Unit (TCU) model of
computation. The TCU model assumes that multiplication between two square matrices of …