Hyper-AP: Enhancing associative processing through a full-stack optimization

Y Zha, J Li - 2020 ACM/IEEE 47th Annual International …, 2020 - ieeexplore.ieee.org
Associative processing (AP) is a promising PIM paradigm that overcomes the von Neumann
bottleneck (memory wall) by virtue of a radically different execution model. By decomposing …

Verified instruction-level energy consumption measurement for nvidia gpus

Y Arafa, A ElWazir, A ElKanishy, Y Aly… - Proceedings of the 17th …, 2020 - dl.acm.org
GPUs are prevalent in modern computing systems at all scales. They consume a significant
fraction of the energy in these systems. However, vendors do not publish the actual cost of …

Hybrid, scalable, trace-driven performance modeling of GPGPUs

Y Arafa, AH Badawy, A ElWazir, A Barai… - Proceedings of the …, 2021 - dl.acm.org
In this paper, we present PPT-GPU, a scalable performance prediction toolkit for GPUs. PPT-
GPU achieves scalability through a hybrid high-level modeling approach where some …

Benchmarking and dissecting the nvidia hopper gpu architecture

W Luo, R Fan, Z Li, D Du, Q Wang, X Chu - arxiv preprint arxiv …, 2024 - arxiv.org
Graphics processing units (GPUs) are continually evolving to cater to the computational
demands of contemporary general-purpose workloads, particularly those driven by artificial …

Demystifying the nvidia ampere architecture through microbenchmarking and instruction-level analysis

H Abdelkhalik, Y Arafa, N Santhi… - 2022 IEEE High …, 2022 - ieeexplore.ieee.org
Graphics Processing Units (GPUs) are now considered the leading hardware to accelerate
general-purpose workloads such as AI, data analytics, and HPC. Over the last decade …

Guardian: Safe GPU Sharing in Multi-Tenant Environments

M Pavlidakis, G Vasiliadis, S Mavridis… - Proceedings of the 25th …, 2024 - dl.acm.org
Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs,
leading to GPU underutilization in cloud environments. Sharing GPUs across multiple …

Fast, accurate, and scalable memory modeling of GPGPUs using reuse profiles

Y Arafa, AH Badawy, G Chennupati, A Barai… - Proceedings of the 34th …, 2020 - dl.acm.org
In this paper, we introduce an accurate and scalable memory modeling framework for
General Purpose Graphics Processor units (GPGPUs), PPT-GPU-Mem. That is Performance …

MPU: Memory-centric SIMT Processor via In-DRAM Near-bank Computing

X **e, P Gu, Y Ding, D Niu, H Zheng, Y **e - ACM Transactions on …, 2023 - dl.acm.org
With the growing number of data-intensive workloads, GPU, which is the state-of-the-art
single-instruction-multiple-thread (SIMT) processor, is hindered by the memory bandwidth …

ParallelFusion: towards maximum utilization of mobile GPU for DNN inference

J Lee, Y Liu, Y Lee - Proceedings of the 5th International Workshop on …, 2021 - dl.acm.org
Mobile GPUs are extremely under-utilized for DNN computations across different mobile
deep learning frameworks and multiple DNNs with various complexities. We explore the …

G-Safe: Safe GPU Sharing in Multi-Tenant Environments

M Pavlidakis, G Vasiliadis, S Mavridis… - arxiv preprint arxiv …, 2024 - arxiv.org
Modern GPU applications, such as machine learning (ML) frameworks, can only partially
utilize beefy GPUs, leading to GPU underutilization in cloud environments. Sharing GPUs …