Need for speed: Experiences building a trustworthy system-level gpu simulator

O Villa, D Lustig, Z Yan, E Bolotin, Y Fu… - … Symposium on High …, 2021 - ieeexplore.ieee.org
The demands of high-performance computing (HPC) and machine learning (ML) workloads
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …

SAC: Sharing-aware caching in multi-chip GPUs

S Zhang, M Naderan-Tahan, M Jahre… - Proceedings of the 50th …, 2023 - dl.acm.org
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …

Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs

Y Feng, S Na, H Kim, H Jeon - 2024 ACM/IEEE 51st Annual …, 2024 - ieeexplore.ieee.org
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …

Locality-centric data and threadblock management for massive GPUs

M Khairy, V Nikiforov, D Nellans… - 2020 53rd Annual IEEE …, 2020 - ieeexplore.ieee.org
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …

Gps: A global publish-subscribe model for multi-gpu memory management

H Muthukrishnan, D Lustig, D Nellans… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Suboptimal management of memory and bandwidth is one of the primary causes of low
performance on systems comprising multiple GPUs. Existing memory management solutions …

CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization

P Dalmia, RS Kumar, MD Sinclair - 2024 57th IEEE/ACM …, 2024 - ieeexplore.ieee.org
Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …

REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems

G Ko, J Lee, H Kal, H Lee, WW Ro - Journal of Systems Architecture, 2025 - Elsevier
With the increasing demands of modern workloads, multi-GPU systems have emerged as a
scalable solution, extending performance beyond the capabilities of single GPUs. However …

Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems

H Muthukrishnan, D Lustig, O Villa… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Recent studies have shown that using fine-grained peer-to-peer (P2P) stores to
communicate among devices in multi-GPU systems is a promising path to achieve strong …

Efficient multi-gpu shared memory via automatic optimization of fine-grained transfers

H Muthukrishnan, D Nellans, D Lustig… - 2021 ACM/IEEE 48th …, 2021 - ieeexplore.ieee.org
Despite continuing research into inter-GPU communication mechanisms, extracting
performance from multi-GPU systems remains a significant challenge. Inter-GPU …

Multi-GPU multi-display rendering of extremely large 3D environments

Y Dong, C Peng - The Visual Computer, 2023 - Springer
In real-time rendering applications, mesh rendering quality suffers from limited GPU memory
capacity and display resolution. Due to the increased complexity of models and the demand …