NAS Parallel Benchmarks with CUDA and beyond
Abstract NAS Parallel Benchmarks (NPB) is a standard benchmark suite used in the
evaluation of parallel hardware and software. Several research efforts from academia have …
evaluation of parallel hardware and software. Several research efforts from academia have …
Efficient NAS parallel benchmark kernels with CUDA
NAS Parallel Benchmarks (NPB) are one of the standard benchmark suites used to evaluate
parallel hardware and software. There are many research efforts trying to provide different …
parallel hardware and software. There are many research efforts trying to provide different …
Optimizing gpu register usage: Extensions to openacc and compiler optimizations
Using compiler directives to program accelerator-based systems through APIs such as
OpenACC or OpenMP has increasingly gained popularity due to the portability and …
OpenACC or OpenMP has increasingly gained popularity due to the portability and …
Automatically exploiting the memory hierarchy of gpus through just-in-time compilation
Although Graphics Processing Units (GPUs) have become pervasive for data-parallel
workloads, the efficient exploitation of their tiered memory hierarchy requires explicit …
workloads, the efficient exploitation of their tiered memory hierarchy requires explicit …
Exploring OpenMP GPU Offloading for Implementing Convolutional Neural Networks
Computing on heterogeneous architecture involving CPUs and accelerators is now a
popular choice of parallel computing. As a directive-based programming model, OpenMP …
popular choice of parallel computing. As a directive-based programming model, OpenMP …
[PDF][PDF] Optimizing the Performance of Directive-based Programming Model for GPGPUs
R Xu - 2016 - uh-ir.tdl.org
Accelerators have been deployed on most major HPC systems. They are considered to
improve the performance of many applications. Accelerators such as GPUs have an …
improve the performance of many applications. Accelerators such as GPUs have an …
the th International Workshop on Programming Models and Applications for Multicores and Manycores
ACM SIGPLAN, ACM SIGHPC - dl.acm.org
Matrix computations are widely used in increasing sizes and complexity in scientific
computing and engineering. But current matrix language implementations lack programmer …
computing and engineering. But current matrix language implementations lack programmer …
Optimizing apples lossless audio codec algorithm using NVIDIA CUDA
R Ahmed, MS Islam - 2016 - dspace.bracu.ac.bd
As majority of the compression algorithms are implementations for CPU architecture, the
primary focus of our work is to exploit the opportunities of GPU parallelism in audio …
primary focus of our work is to exploit the opportunities of GPU parallelism in audio …
An open-source solution to performance portability for Summit and Sierra supercomputers
Programming models that use a higher level of abstraction to express parallelism can target
both CPUs and any attached devices, alleviating the maintainability and portability concerns …
both CPUs and any attached devices, alleviating the maintainability and portability concerns …
[PDF][PDF] Implementação CUDA dos Kernels NPB
NAS Parallel Benchmarks (NPB) é um conjunto de benchmarks utilizado para avaliar
hardware e software, que ao longo dos anos foi portado para diferentes frameworks …
hardware e software, que ao longo dos anos foi portado para diferentes frameworks …