Distwar: Fast differentiable rendering on raster-based rendering pipelines

S Durvasula, A Zhao, F Chen, R Liang… - arxiv preprint arxiv …, 2023 - arxiv.org
Differentiable rendering is a technique used in an important emerging class of visual
computing applications that involves representing a 3D scene as a model that is trained from …

Hmg: Extending cache coherence protocols across modern hierarchical multi-gpu systems

X Ren, D Lustig, E Bolotin, A Jaleel… - … Symposium on High …, 2020 - ieeexplore.ieee.org
Prior work on GPU cache coherence has shown that simple hardware-or software-based
protocols can be more than sufficient. However, in recent years, features such as multi-chip …

Gps: A global publish-subscribe model for multi-gpu memory management

H Muthukrishnan, D Lustig, D Nellans… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Suboptimal management of memory and bandwidth is one of the primary causes of low
performance on systems comprising multiple GPUs. Existing memory management solutions …

Finepack: Transparently improving the efficiency of fine-grained transfers in multi-gpu systems

H Muthukrishnan, D Lustig, O Villa… - … Symposium on High …, 2023 - ieeexplore.ieee.org
Recent studies have shown that using fine-grained peer-to-peer (P2P) stores to
communicate among devices in multi-GPU systems is a promising path to achieve strong …

REC: Enhancing fine-grained cache coherence protocol in multi-GPU systems

G Ko, J Lee, H Kal, H Lee, WW Ro - Journal of Systems Architecture, 2025 - Elsevier
With the increasing demands of modern workloads, multi-GPU systems have emerged as a
scalable solution, extending performance beyond the capabilities of single GPUs. However …

A survey of architectural approaches for improving GPGPU performance, programmability and heterogeneity

M Khairy, AG Wassal, M Zahran - Journal of Parallel and Distributed …, 2019 - Elsevier
With the skyrocketing advances of process technology, the increased need to process huge
amount of data, and the pivotal need for power efficiency, the usage of Graphics Processing …

Heterogen: Automatic synthesis of heterogeneous cache coherence protocols

N Oswald, V Nagarajan, DJ Sorin… - … Symposium on High …, 2022 - ieeexplore.ieee.org
We solve the two challenges architects face when designing heterogeneous processors with
cache coherent shared memory. First, we develop an automated tool, called HeteroGen, for …

Only buffer when you need to: Reducing on-chip gpu traffic with reconfigurable local atomic buffers

P Dalmia, R Mahapatra… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
In recent years, due to their wide availability and ease of programming, GPUs have emerged
as the accelerator of choice for a wide variety of applications including graph analytics and …

Fast fine-grained global synchronization on GPUs

K Wang, D Fussell, C Lin - … of the Twenty-Fourth International Conference …, 2019 - dl.acm.org
This paper extends the reach of General Purpose GPU programming by presenting a
software architecture that supports efficient fine-grained synchronization over global …

Exploring memory persistency models for gpus

Z Lin, M Alshboul, Y Solihin… - 2019 28th International …, 2019 - ieeexplore.ieee.org
Given its high integration density, high speed, byte addressability, and low standby power,
non-volatile or persistent memory is expected to supplement/replace DRAM as main …