„Google“ mokslinčius

Dasp: Specific dense matrix multiply-accumulate units accelerated general sparse matrix-vector multiplication

Y Lu, W Liu - Proceedings of the International Conference for High …, 2023 - dl.acm.org

Sparse matrix-vector multiplication (SpMV) plays a key role in computational science and
engineering, graph processing, and machine learning applications. Much work on SpMV …

Išsaugoti Cituoti Cituoja 11 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] aps.org

Thermodynamic matrix exponentials and thermodynamic parallelism

S Duffield, M Aifer, G Crooks, T Ahle, PJ Coles - Physical Review Research, 2025 - APS

Thermodynamic computing exploits fluctuations and dissipation in physical systems to
efficiently solve various mathematical problems. It was recently shown that certain linear …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 4 versijos

[Free GPT-4]
[DeepSeek]

[PDF] utexas.edu

Accelerating ml workloads using gpu tensor cores: The good, the bad, and the ugly

B Hanindhito, LK John - Proceedings of the 15th ACM/SPEC …, 2024 - dl.acm.org

Machine Learning (ML) workloads generally contain a significant amount of matrix
computations; hence, hardware accelerators for ML have been incorporating support for …

Išsaugoti Cituoti Cituoja 5 Susiję straipsniai Visos 6 versijos

A novel parallel algorithm for sparse tensor matrix chain multiplication via TCU-acceleration

H Wang, W Yang, R Hu, R Ouyang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Analysis of multi-dimensional data, especially tensor decomposition, which extracts latent
information, is becoming considerably popular. Although multi-dimensional sparse data is …

Išsaugoti Cituoti Cituoja 7 Susiję straipsniai Visos 3 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Similarity search with tensor core units

T D. Ahle, F Silvestri - International Conference on Similarity Search and …, 2020 - Springer

Abstract Tensor Core Units (TCUs) are hardware accelerators developed for deep neural
networks, which efficiently support the multiplication of two dense m * m matrices, where m is …

Išsaugoti Cituoti Cituoja 10 Susiję straipsniai Visos 9 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Blocking techniques for sparse matrix multiplication on tensor accelerators

PS Labini, M Bernaschi, F Silvestri, F Vella - arxiv preprint arxiv …, 2022 - arxiv.org

Tensor accelerators have gained popularity because they provide a cheap and efficient
solution for speeding up computational-expensive tasks in Deep Learning and, more …

Išsaugoti Cituoti Cituoja 5 Susiję straipsniai Visos 4 versijos HTML kopija

Parallelizing filter-and-verification based exact set similarity joins on multicores

F Fier, JC Freytag - Information Systems, 2022 - Elsevier

Set similarity join (SSJ) is a well studied problem with many algorithms proposed to speed
up its performance. However, its scalability and performance are rarely discussed in modern …

Išsaugoti Cituoti Cituoja 8 Susiję straipsniai Visos 5 versijos

Interpret: Inter-warp register reuse for gpu tensor core

JS Kwak, MK Yoon, I Jeong, S **… - 2023 32nd International …, 2023 - ieeexplore.ieee.org

Tensor cores in the recent NVIDIA GPUs are under the spotlight due to their superior
computation throughput for general matrix-matrix multiplication (GEMM) that has been …

Išsaugoti Cituoti Cituoja 1 Susiję straipsniai Visos 5 versijos

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Accelerating finite impulse response filtering using tensor cores

T Kondo, Y Maeda, N Fukushima - 2021 Asia-Pacific Signal …, 2021 - ieeexplore.ieee.org

This paper studies how to accelerate a single channel 2D image convolution using NVIDIA's
Tensor Core. Tensor Core is a dedicated arithmetic unit for speeding up matrix products and …

Išsaugoti Cituoti Cituoja 5 Susiję straipsniai Visos 6 versijos

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

A parallel scan algorithm in the tensor core unit model

A Zouzias, WF McColl - European Conference on Parallel Processing, 2023 - Springer

We present a parallel scan (prefix sum) algorithm in the Tensor Core Unit (TCU) model of
computation. The TCU model assumes that multiplication between two square matrices of …

Išsaugoti Cituoti Cituoja 1 Susiję straipsniai Visos 5 versijos

Kurti įspėjimą

Cituoti

Išplėstinė paieška

Išsaugota skiltyje „Mano biblioteka“

A computational model for tensor core units

Dasp: Specific dense matrix multiply-accumulate units accelerated general sparse matrix-vector multiplication

Thermodynamic matrix exponentials and thermodynamic parallelism

Accelerating ml workloads using gpu tensor cores: The good, the bad, and the ugly

A novel parallel algorithm for sparse tensor matrix chain multiplication via TCU-acceleration

Similarity search with tensor core units

Blocking techniques for sparse matrix multiplication on tensor accelerators

Parallelizing filter-and-verification based exact set similarity joins on multicores

Interpret: Inter-warp register reuse for gpu tensor core

Accelerating finite impulse response filtering using tensor cores

A parallel scan algorithm in the tensor core unit model