Google Академія

Turnitin 降AI改写早检测系统早降重系统 Turnitin-UK版万方检测-期刊版维普编辑部版 Grammarly检测 Paperpass检测 checkpass检测 PaperYY检测

Simple hardware-efficient long convolutions for sequence modeling

DY Fu, EL Epstein, E Nguyen… - International …, 2023 - proceedings.mlr.press

State space models (SSMs) have high performance on long sequence modeling but require
sophisticated initialization techniques and specialized implementations for high quality and …

Зберегти Послатися Цитовано в 59 джерелах Пов’язані статті Кількість версій: 8 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Flashfftconv: Efficient convolutions for long sequences with tensor cores

DY Fu, H Kumbong, E Nguyen, C Ré - ar** density matrix perturbation theory onto the computational structure …

Зберегти Послатися Цитовано в 16 джерелах Пов’язані статті Кількість версій: 13

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Acceleration of tensor-product operations with tensor cores

C Cui - ACM Transactions on Parallel Computing, 2024 - dl.acm.org

In this article, we explore the acceleration of tensor product operations in finite element
methods, leveraging the computational power of the NVIDIA A100 GPU Tensor Cores. We …

Зберегти Послатися Цитовано в 4 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] ed.ac.uk

Bind the gap: Compiling real software to hardware FFT accelerators

J Woodruff, J Armengol-Estapé, S Ainsworth… - Proceedings of the 43rd …, 2022 - dl.acm.org

Specialized hardware accelerators continue to be a source of performance improvement.
However, such specialization comes at a programming price. The fundamental issue is that …

Зберегти Послатися Цитовано в 13 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Accelerating range minimum queries with ray tracing cores

E Meneses, CA Navarro, H Ferrada… - Future Generation …, 2024 - Elsevier

Over the past decade, GPU technology has undergone a notable transformation, evolving
from pure general-purpose computation to the integration of application-specific integrated …

Зберегти Послатися Цитовано в 5 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library

H Ootomo, R Yokota - Proceedings of the International Conference on …, 2023 - dl.acm.org

Matrix-matrix multiplication is used for various linear algebra algorithms such as matrix
decomposition and tensor contraction. NVIDIA Tensor Core is a mixed-precision matrix …

Зберегти Послатися Цитовано в 8 джерелах Пов’язані статті Кількість версій: 6

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Simple hardware-efficient long convolutions for sequence modeling

Flashfftconv: Efficient convolutions for long sequences with tensor cores

Acceleration of tensor-product operations with tensor cores

Bind the gap: Compiling real software to hardware FFT accelerators

Accelerating range minimum queries with ray tracing cores

Reducing shared memory footprint to leverage high throughput on Tensor Cores and its flexible API extension library