Google Академія

G Araujo, D Griebler, DA Rockenbach… - Software: Practice …, 2023 - Wiley Online Library

Abstract NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the
evaluation of parallel hardware and software. Several research efforts from academia have …

Зберегти Послатися Цитовано в 26 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] pucrs.br

Efficient NAS parallel benchmark kernels with CUDA

GA de Araujo, D Griebler, M Danelutto… - 2020 28th Euromicro …, 2020 - ieeexplore.ieee.org

NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate
parallel hardware and software. There are many research efforts trying to provide different …

Зберегти Послатися Цитовано в 23 джерелах Пов’язані статті Кількість версій: 4

[Free GPT-4]
[DeepSeek]

[PDF] stonybrook.edu

Optimizing GPU register usage: Extensions to OpenACC and compiler optimizations

X Tian, D Khaldi, D Eachempati, R Xu… - 2016 45th …, 2016 - ieeexplore.ieee.org

Using compiler directives to program accelerator-based systems through APIs such as
OpenACC or OpenMP has increasingly gained popularity due to the portability and …

Зберегти Послатися Цитовано в 9 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks

K Yan, Y Shi, Y Yan - Proceedings of the 14th International Workshop on …, 2023 - dl.acm.org

Computing on heterogeneous architecture involving CPUs and accelerators is now a
popular choice of parallel computing. As a directive-based programming model, OpenMP …

Зберегти Послатися Пов’язані статті

[Free GPT-4]
[DeepSeek]

[PDF] manchester.ac.uk

Automatically exploiting the memory hierarchy of gpus through just-in-time compilation

M Papadimitriou, J Fumero, A Stratikopoulos… - Proceedings of the 17th …, 2021 - dl.acm.org

Although Graphics Processing Units (GPUs) have become pervasive for data-parallel
workloads, the efficient exploitation of their tiered memory hierarchy requires explicit …

Зберегти Послатися Цитовано в 3 джерелах Пов’язані статті Кількість версій: 5

[Free GPT-4]
[DeepSeek]

[PDF] tdl.org

[PDF][PDF] Optimizing the Performance of Directive-based Programming Model for GPGPUs

R Xu - 2016 - uh-ir.tdl.org

Accelerators have been deployed on most major HPC systems. They are considered to
improve the performance of many applications. Accelerators such as GPUs have an …

Зберегти Послатися Цитовано в 2 джерелах Пов’язані статті Кількість версій: 5 Показати у форматі HTML

[Free GPT-4]
[DeepSeek]

[PDF] bracu.ac.bd

Optimizing apples lossless audio codec algorithm using NVIDIA CUDA

R Ahmed, MS Islam - 2016 - dspace.bracu.ac.bd

As majority of the compression algorithms are implementations for CPU architecture, the
primary focus of our work is to exploit the opportunities of GPU parallelism in audio …

Зберегти Послатися Цитовано в 2 джерелах Пов’язані статті Кількість версій: 2 Показати у форматі HTML

the th International Workshop on Programming Models and Applications for Multicores and Manycores

ACM SIGPLAN, ACM SIGHPC - dl.acm.org

Matrix computations are widely used in increasing sizes and complexity in scientific
computing and engineering. But current matrix language implementations lack programmer …

Зберегти Послатися Пов’язані статті Кількість версій: 2

An open-source solution to performance portability for Summit and Sierra supercomputers

GT Bercea, A Bataev, AE Eichenberger… - IBM Journal of …, 2019 - ieeexplore.ieee.org

Programming models that use a higher level of abstraction to express parallelism can target
both CPUs and any attached devices, alleviating the maintainability and portability concerns …

Зберегти Послатися Цитовано в 1 джерелах Пов’язані статті Кількість версій: 3

[Free GPT-4]
[DeepSeek]

[PDF] cardiff.ac.uk

Locality data properties of 3D data orderings with application to parallel molecular dynamics simulations

I Al Kharusi - 2019 - orca.cardiff.ac.uk

General-purpose computing on GPUs is widely adopted for scientific applications, providing
inexpensive platforms for massively parallel computation. This has motivated us to …

Зберегти Послатися Пов’язані статті Кількість версій: 4 Пошук бібліотеки Показати у форматі HTML

Створити сповіщення

Послатися

Розширений пошук

Збережено в моїй бібліотеці

Compiler transformation of nested loops for general purpose GPUs

NAS Parallel Benchmarks with CUDA and beyond

Efficient NAS parallel benchmark kernels with CUDA

Optimizing GPU register usage: Extensions to OpenACC and compiler optimizations

Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks

Automatically exploiting the memory hierarchy of gpus through just-in-time compilation

[PDF][PDF] Optimizing the Performance of Directive-based Programming Model for GPGPUs

Optimizing apples lossless audio codec algorithm using NVIDIA CUDA

the th International Workshop on Programming Models and Applications for Multicores and Manycores

An open-source solution to performance portability for Summit and Sierra supercomputers

Locality data properties of 3D data orderings with application to parallel molecular dynamics simulations