Programming for high-performance computing on edge accelerators

P Kang - Mathematics, 2023 - mdpi.com
The field of edge computing has grown considerably over the past few years, with
applications in artificial intelligence and big data processing, particularly due to its powerful …

Performance evaluation of Unified Memory with prefetching and oversubscription for selected parallel CUDA applications on NVIDIA Pascal and Volta GPUs

M Knap, P Czarnul - The Journal of Supercomputing, 2019 - Springer
The paper presents assessment of Unified Memory performance with data prefetching and
memory oversubscription. Several versions of code are used with: standard memory …

[BOOK][B] Parallel programming for modern high performance computing systems

P Czarnul - 2018 - books.google.com
In view of the growing presence and popularity of multicore and manycore processors,
accelerators, and coprocessors, as well as clusters using such computing devices, the …

A multithreaded CUDA and OpenMP based power‐aware programming framework for multi‐node GPU systems

P Czarnul - Concurrency and Computation: Practice and …, 2023 - Wiley Online Library
In the article, we have proposed a framework that allows programming a parallel application
for a multi‐node system, with one or more graphical processing units (GPUs) per node …

Characterizing cuda unified memory (um)-aware mpi designs on modern gpu architectures

KV Manian, AA Ammar, A Ruhela, CH Chu… - Proceedings of the 12th …, 2019 - dl.acm.org
The CUDA Unified Memory (UM) interface enables a significantly simpler programming
paradigm and has the potential to fundamentally change the way programmers write CUDA …

A quantitative evaluation of unified memory in GPUs

Q Yu, B Childers, L Huang, C Qian, Z Wang - The Journal of …, 2020 - Springer
The introduction of unified memory and demand paging has simplified programming of
graphics processing units (GPUs). It has also enabled oversubscribing the memory for a …

Investigation of parallel data processing using hybrid high performance CPU+ GPU systems and CUDA streams

P Czarnul - Computing and informatics, 2020 - cai.sk
The paper investigates parallel data processing in a hybrid CPU+ GPU (s) system using
multiple CUDA streams for overlap** communication and computations. This is crucial for …

Online multimedia retrieval on CPU–GPU platforms with adaptive work partition

R Souza, A Fernandes, TSFX Teixeira… - Journal of Parallel and …, 2021 - Elsevier
Nearest neighbors search is a core operation found in several online multimedia services.
These services have to handle very large databases, while, at the same time, they must …

[PDF][PDF] Parallel model counting with CUDA: Algorithm engineering for efficient hardware utilization

JK Fichte, M Hecher, V Roland - 27th International Conference on …, 2021 - drops.dagstuhl.de
Propositional model counting (MC) and its extensions as well as applications in the area of
probabilistic reasoning have received renewed attention in recent years. As a result, also the …

Static graphs for coding productivity in openacc

L Toledo, P Valero-Lara, J Vetter… - 2021 IEEE 28th …, 2021 - ieeexplore.ieee.org
The main contribution of this work is to increase the coding productivity for GPU
programming by using the concept of Static Graphs. To do so, we have combined the new …