Need for speed: Experiences building a trustworthy system-level gpu simulator
The demands of high-performance computing (HPC) and machine learning (ML) workloads
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …
have resulted in the rapid architectural evolution of GPUs over the last decade. The growing …
SAC: Sharing-aware caching in multi-chip GPUs
Bandwidth non-uniformity in multi-chip GPUs poses a major design challenge for its last-
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
level cache (LLC) architecture. Whereas a memory-side LLC caches data from the local …
Barre Chord: Efficient Virtual Memory Translation for Multi-Chip-Module GPUs
With the advancement of processor packaging technology and the looming end of Moore's
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
law, multi-chip-module (MCM) GPUs become a promising architecture to continue the …
Locality-centric data and threadblock management for massive GPUs
Recent work has shown that building GPUs with hundreds of SMs in a single monolithic chip
will not be practical due to slowing growth in transistor density, low chip yields, and …
will not be practical due to slowing growth in transistor density, low chip yields, and …
Gps: A global publish-subscribe model for multi-gpu memory management
Suboptimal management of memory and bandwidth is one of the primary causes of low
performance on systems comprising multiple GPUs. Existing memory management solutions …
performance on systems comprising multiple GPUs. Existing memory management solutions …
CPElide: Efficient Multi-Chiplet GPU Implicit Synchronization
Chiplets are transforming computer system designs, allowing system designers to combine
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …
heterogeneous computing resources at unprecedented scales. Breaking larger, mono-lithic …
REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems
With the increasing demands of modern workloads, multi-GPU systems have emerged as a
scalable solution, extending performance beyond the capabilities of single GPUs. However …
scalable solution, extending performance beyond the capabilities of single GPUs. However …
Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems
Recent studies have shown that using fine-grained peer-to-peer (P2P) stores to
communicate among devices in multi-GPU systems is a promising path to achieve strong …
communicate among devices in multi-GPU systems is a promising path to achieve strong …
Efficient multi-gpu shared memory via automatic optimization of fine-grained transfers
Despite continuing research into inter-GPU communication mechanisms, extracting
performance from multi-GPU systems remains a significant challenge. Inter-GPU …
performance from multi-GPU systems remains a significant challenge. Inter-GPU …
Multi-GPU multi-display rendering of extremely large 3D environments
Y Dong, C Peng - The Visual Computer, 2023 - Springer
In real-time rendering applications, mesh rendering quality suffers from limited GPU memory
capacity and display resolution. Due to the increased complexity of models and the demand …
capacity and display resolution. Due to the increased complexity of models and the demand …