Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …
increasing demand on computation capability in emerging domains such as deep learning …
GPU-accelerated molecular dynamics: State-of-art software performance and porting from Nvidia CUDA to AMD HIP
Classical molecular dynamics (MD) calculations represent a significant part of the utilization
time of high-performance computing systems. As usual, the efficiency of such calculations is …
time of high-performance computing systems. As usual, the efficiency of such calculations is …
MGPUSim: Enabling multi-GPU performance modeling and optimization
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …
Analyzing machine learning workloads using a detailed GPU simulator
Machine learning (ML) has recently emerged as an important application driving future
architecture design. Traditionally, architecture research has used detailed simulators to …
architecture design. Traditionally, architecture research has used detailed simulators to …
Tartan: evaluating modern GPU interconnect via a multi-GPU benchmark suite
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …
increasing demand on computation capability in emerging domains such as deep learning …
Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …
and FPGAs for improved performance and energy efficiency. At the same time …
Navisim: A highly accurate GPU simulator for AMD RDNA GPUs
As GPUs continue to grow in popularity for accelerating demanding applications, such as
high-performance computing and machine learning, GPU architects need to deliver more …
high-performance computing and machine learning, GPU architects need to deliver more …
Path Forward Beyond Simulators: Fast and Accurate GPU Execution Time Prediction for DNN Workloads
Today, DNNs' high computational complexity and sub-optimal device utilization present a
major roadblock to democratizing DNNs. To reduce the execution time and improve device …
major roadblock to democratizing DNNs. To reduce the execution time and improve device …
Preparing ginkgo for amd gpus–a testimonial on porting cuda code to hip
With AMD reinforcing their ambition in the scientific high performance computing ecosystem,
we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP …
we extend the hardware scope of the Ginkgo linear algebra package to feature a HIP …
[BUCH][B] Heterogeneous computing architectures: Challenges and vision
Heterogeneous Computing Architectures: Challenges and Vision provides an updated
vision of the state-of-the-art of heterogeneous computing systems, covering all the aspects …
vision of the state-of-the-art of heterogeneous computing systems, covering all the aspects …