Morpheus: Creating application objects efficiently for heterogeneous computing

HW Tseng, Q Zhao, Y Zhou, M Gahagan… - ACM SIGARCH …, 2016 - dl.acm.org
In high performance computing systems, object deserialization can become a surprisingly
important bottleneck---in our test, a set of general-purpose, highly parallelized applications …

Multi-gpu communication schemes for iterative solvers: When cpus are not in charge

I Ismayilov, J Baydamirli, D Sağbili, M Wahib… - Proceedings of the 37th …, 2023 - dl.acm.org
This paper proposes a fully autonomous execution model for multi-GPU applications that
completely excludes the involvement of the CPU beyond the initial kernel launch. In a typical …

The landscape of gpu-centric communication

D Unat, I Turimbetov, MKT Issa, D Sağbili… - arxiv preprint arxiv …, 2024 - arxiv.org
In recent years, GPUs have become the preferred accelerators for HPC and ML applications
due to their parallelism and fast memory bandwidth. While GPUs boost computation, inter …

Network endpoint congestion control for fine-grained communication

N Jiang, L Dennison, WJ Dally - … of the International Conference for High …, 2015 - dl.acm.org
Endpoint congestion in HPC networks creates tree saturation that is detrimental to
performance. Endpoint congestion can be alleviated by reducing the injection rate of traffic …

dCUDA: hardware supported overlap of computation and communication

T Gysi, J Bär, T Hoefler - SC'16: Proceedings of the …, 2016 - ieeexplore.ieee.org
Over the last decade, CUDA and the underlying GPU hardware architecture have
continuously gained popularity in various high-performance computing application domains …

InfiniBand Verbs on GPU: a case study of controlling an InfiniBand network device from the GPU

L Oden, H Fröning - The International Journal of High …, 2017 - journals.sagepub.com
Due to their massive parallelism and high performance per Watt, GPUs have gained high
popularity in high-performance computing and are a strong candidate for future exascale …

Gpu initiated openshmem: correct and efficient intra-kernel networking for dgpus

K Hamidouche, M LeBeane - Proceedings of the 25th ACM SIGPLAN …, 2020 - dl.acm.org
Current state-of-the-art in GPU networking utilizes a host-centric, kernel-boundary
communication model that reduces performance and increases code complexity. To address …

Exploiting gpudirect rdma in designing high performance openshmem for nvidia gpu clusters

K Hamidouche, A Venkatesh, AA Awan… - 2015 IEEE …, 2015 - ieeexplore.ieee.org
GPUDirect RDMA (GDR) brings the high-performance communication capabilities of RDMA
networks like InfiniBand (IB) to GPUs (referred to as" Device"). It enables IB network …

Relaxations for high-performance message passing on massively parallel SIMT processors

B Klenk, H Fröening, H Eberle… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Accelerators, such as GPUs, have proven to be highly successful in reducing execution time
and power consumption of compute-intensive applications. Even though they are already …

GPU triggered networking for intra-kernel communications

M LeBeane, K Hamidouche, B Benton… - Proceedings of the …, 2017 - dl.acm.org
GPUs are widespread across clusters of compute nodes due to their attractive performance
for data parallel codes. However, communicating between GPUs across the cluster is …