Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect

A Li, SL Song, J Chen, J Li, X Liu… - … on Parallel and …, 2019 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP

N Kondratyuk, V Nikolskiy, D Pavlov… - … Journal of High …, 2021 - journals.sagepub.com
Classical molecular dynamics (MD) calculations represent a significant part of the utilization
time of high-performance computing systems. As usual, the efficiency of such calculations is …

MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

Analyzing machine learning workloads using a detailed GPU simulator

J Lew, DA Shah, S Pati, S Cattell… - … analysis of systems …, 2019 - ieeexplore.ieee.org
Machine learning (ML) has recently emerged as an important application driving future
architecture design. Traditionally, architecture research has used detailed simulators to …

Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite

A Li, SL Song, J Chen, X Liu, N Tallent… - 2018 IEEE …, 2018 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures

S Huang, LW Chang, I El Hajj… - Proceedings of the …, 2019 - dl.acm.org
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …

Navisim: A highly accurate GPU simulator for AMD RDNA GPUs

Y Bao, Y Sun, Z Feric, MT Shen, M Weston… - Proceedings of the …, 2022 - dl.acm.org
As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …

Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads

Y Li, Y Sun, A Jog - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Today, DNNs' high computational complexity and sub-optimal device utilization present a
major roadblock to democratizing DNNs. To reduce the execution time and improve device …

Preparing ginkgo for amd gpus–a testimonial on porting cuda code to hip

YM Tsai, T Cojean, T Ribizel, H Anzt - European Conference on Parallel …, 2020 - Springer
With AMD reinforcing their ambition in the scientific high performance computing ecosystem,
we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP …

[BUCH][B] Heterogeneous computing architectures: Challenges and vision

O Terzo, K Djemame, A Scionti, C Pezuela - 2019 - books.google.com
Heterogeneous Computing Architectures: Challenges and Vision provides an updated
vision of the state-of-the-art of heterogeneous computing systems, covering all the aspects …