Energy-aware scheduling for high-performance computing systems: A survey

B Kocot, P Czarnul, J Proficz - Energies, 2023 - mdpi.com
High-performance computing (HPC), according to its name, is traditionally oriented toward
performance, especially the execution time and scalability of the computations. However …

Deeprecsys: A system for optimizing end-to-end at-scale neural recommendation inference

U Gupta, S Hsia, V Saraph, X Wang… - 2020 ACM/IEEE 47th …, 2020 - ieeexplore.ieee.org
Neural personalized recommendation is the cornerstone of a wide collection of cloud
services and products, constituting significant compute demand of cloud infrastructure. Thus …

Summarizing CPU and GPU design trends with product data

Y Sun, NB Agostini, S Dong, D Kaeli - arxiv preprint arxiv:1911.11313, 2019 - arxiv.org
Moore's Law and Dennard Scaling have guided the semiconductor industry for the past few
decades. Recently, both laws have faced validity challenges as transistor sizes approach …

MGPUSim: Enabling multi-GPU performance modeling and optimization

Y Sun, T Baruah, SA Mojumder, S Dong… - Proceedings of the 46th …, 2019 - dl.acm.org
The rapidly growing popularity and scale of data-parallel workloads demand a
corresponding increase in raw computational power of Graphics Processing Units (GPUs) …

Chai: Collaborative heterogeneous applications for integrated-architectures

J Gómez-Luna, I El Hajj, LW Chang… - … Analysis of Systems …, 2017 - ieeexplore.ieee.org
Heterogeneous system architectures are evolving towards tighter integration among
devices, with emerging features such as shared virtual memory, memory coherence, and …

Gnnmark: A benchmark suite to characterize graph neural network training on gpus

T Baruah, K Shivdikar, S Dong, Y Sun… - … Analysis of Systems …, 2021 - ieeexplore.ieee.org
Graph Neural Networks (GNNs) have emerged as a promising class of Machine Learning
algorithms to train on non-euclidean data. GNNs are widely used in recommender systems …

Grus: Toward unified-memory-efficient high-performance graph processing on gpu

P Wang, J Wang, C Li, J Wang, H Zhu… - ACM Transactions on …, 2021 - dl.acm.org
Today's GPU graph processing frameworks face scalability and efficiency issues as the
graph size exceeds GPU-dedicated memory limit. Although recent GPUs can over-subscribe …

Griffin: Hardware-software support for efficient page migration in multi-gpu systems

T Baruah, Y Sun, AT Dinçer… - … Symposium on High …, 2020 - ieeexplore.ieee.org
As transistor scaling becomes increasingly more difficult to achieve, scaling the core count
on a single GPU chip has also become extremely challenging. As the volume of data to …

IDYLL: Enhancing Page Translation in Multi-GPUs via Light Weight PTE Invalidations

B Li, Y Guo, Y Wang, A Jaleel, J Yang… - Proceedings of the 56th …, 2023 - dl.acm.org
Multi-GPU systems have emerged as a desirable platform to deliver high computing
capabilities and large memory capacity to accommodate large dataset sizes. However …

Analysis and modeling of collaborative execution strategies for heterogeneous CPU-FPGA architectures

S Huang, LW Chang, I El Hajj… - Proceedings of the …, 2019 - dl.acm.org
Heterogeneous CPU-FPGA systems are evolving towards tighter integration between CPUs
and FPGAs for improved performance and energy efficiency. At the same time …