Parallel programming models for heterogeneous many-cores: a comprehensive survey

J Fang, C Huang, T Tang, Z Wang - CCF Transactions on High …, 2020 - Springer
Heterogeneous many-cores are now an integral part of modern computing systems ranging
from embedding systems to supercomputers. While heterogeneous many-core design offers …

Accelerating AutoDock4 with GPUs and Gradient-Based Local Search

D Santos-Martins, L Solis-Vasquez… - Journal of chemical …, 2021 - ACS Publications
AutoDock4 is a widely used program for docking small molecules to macromolecular targets.
It describes ligand–receptor interactions using a physics-inspired scoring function that has …

Evaluating attainable memory bandwidth of parallel programming models via BabelStream

T Deakin, J Price, M Martineau… - International Journal …, 2018 - inderscienceonline.com
Many scientific codes consist of memory bandwidth bound kernels. One major advantage of
many-core devices such as general purpose graphics processing units (GPGPUs) and the …

gpucc: an open-source GPGPU compiler

J Wu, A Belevich, E Bendersky, M Heffernan… - Proceedings of the …, 2016 - dl.acm.org
Graphics Processing Units have emerged as powerful accelerators for massively parallel,
numerically intensive workloads. The two dominant software models for these devices are …

High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

WS Moses, IR Ivanov, J Domke, T Endo… - Proceedings of the 28th …, 2023 - dl.acm.org
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …

HW/SW co-design toolset for customization of exposed datapath processors

P Jääskeläinen, T Viitanen, J Takala, H Berg - Computing platforms for …, 2017 - Springer
Customized processors are an interesting option for implementing software defined radios;
they bring benefits of tailored fixed function hardware while adding new advantages such as …

Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs

B Homerding, J Tramm - Proceedings of the International Workshop on …, 2020 - dl.acm.org
Future HPC leadership computing systems for the United States Department of Energy will
utilize GPUs for acceleration of scientific codes. These systems will utilize GPUs from …

KLAP: Kernel launch aggregation and promotion for optimizing dynamic parallelism

I El Hajj, J Gómez-Luna, C Li, LW Chang… - 2016 49th Annual …, 2016 - ieeexplore.ieee.org
Dynamic parallelism on GPUs simplifies the programming of many classes of applications
that generate paral-lelizable work not known prior to execution. However, modern GPUs …

Programming bare-metal accelerators with heterogeneous threading models: a case study of Matrix-3000

J Fang, P Zhang, C Huang, T Tang, K Lu… - Frontiers of Information …, 2023 - Springer
As the hardware industry moves toward using specialized heterogeneous many-core
processors to avoid the effects of the power wall, software developers are finding it hard to …

COX: Exposing CUDA warp-level functions to CPUs

R Han, J Lee, J Sim, H Kim - ACM Transactions on Architecture and …, 2022 - dl.acm.org
As CUDA becomes the de facto programming language among data parallel applications
such as high-performance computing or machine learning applications, running CUDA on …