Evaluating modern gpu interconnect: Pcie, nvlink, nv-sli, nvswitch and gpudirect

A Li, SL Song, J Chen, J Li, X Liu… - … on Parallel and …, 2019 - ieeexplore.ieee.org
High performance multi-GPU computing becomes an inevitable trend due to the ever-
increasing demand on computation capability in emerging domains such as deep learning …

Pump up the volume: Processing large data on gpus with fast interconnects

C Lutz, S Breß, S Zeuch, T Rabl, V Markl - Proceedings of the 2020 ACM …, 2020 - dl.acm.org
GPUs have long been discussed as accelerators for database query processing because of
their high processing power and memory bandwidth. However, two main challenges limit the …

Deep Learning Library Testing: Definition, Methods and Challenges

X Zhang, W Jiang, C Shen, Q Li, Q Wang, C Lin… - ACM Computing …, 2024 - dl.acm.org
Recently, software systems powered by deep learning (DL) techniques have significantly
facilitated people's lives in many aspects. As the backbone of these DL systems, various DL …

Sv-sim: scalable pgas-based state vector simulation of quantum circuits

A Li, B Fang, C Granade, G Prawiroatmodjo… - Proceedings of the …, 2021 - dl.acm.org
High-performance quantum circuit simulation in a classic HPC is still imperative in the NISQ
era. Observing that the major obstacle of scalable state-vector quantum simulation arises …

Performance evaluation of advanced features in CUDA unified memory

S Chien, I Peng, S Markidis - 2019 IEEE/ACM Workshop on …, 2019 - ieeexplore.ieee.org
CUDA Unified Memory improves the GPU pro-grammability and also enables GPU memory
oversubscription. Recently, two advanced memory features, memory advises and …

Characterizing deep learning training workloads on alibaba-pai

M Wang, C Meng, G Long, C Wu… - 2019 IEEE …, 2019 - ieeexplore.ieee.org
Modern deep learning models have been exploited in various domains, including computer
vision (CV), natural language processing (NLP), search and recommendation. In practical AI …

Griffin: Hardware-software support for efficient page migration in multi-gpu systems

T Baruah, Y Sun, AT Dinçer… - … Symposium on High …, 2020 - ieeexplore.ieee.org
As transistor scaling becomes increasingly more difficult to achieve, scaling the core count
on a single GPU chip has also become extremely challenging. As the volume of data to …

Density matrix quantum circuit simulation via the BSP machine on modern GPU clusters

A Li, O Subasi, X Yang… - … conference for high …, 2020 - ieeexplore.ieee.org
As quantum computers evolve, simulations of quantum programs on classical computers will
be essential in validating quantum algorithms, understanding the effect of system noise, and …

Gnnmark: A benchmark suite to characterize graph neural network training on gpus

T Baruah, K Shivdikar, S Dong, Y Sun… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …

Evaluating multi-GPU sorting with modern interconnects

T Maltenberger, I Ilic, I Tolovski, T Rabl - Proceedings of the 2022 …, 2022 - dl.acm.org
GPUs have become a mainstream accelerator for database operations such as sorting. Most
GPU sorting algorithms are single-GPU approaches. They neither harness the full …