Stash: Have your scratchpad and cache it too

R Komuravelli, MD Sinclair, J Alsop, M Huzaifa… - ACM SIGARCH …, 2015 - dl.acm.org
Heterogeneous systems employ specialization for energy efficiency. Since data movement
is expected to be a dominant consumer of energy, these systems employ specialized …

Efficient GPU synchronization without scopes: Saying no to complex consistency models

MD Sinclair, J Alsop, SV Adve - … of the 48th International Symposium on …, 2015 - dl.acm.org
As GPUs have become increasingly general purpose, applications with more general
sharing patterns and fine-grained synchronization have started to emerge. Unfortunately …

Spandex: A flexible interface for efficient heterogeneous coherence

J Alsop, M Sinclair, S Adve - 2018 ACM/IEEE 45th Annual …, 2018 - ieeexplore.ieee.org
Recent heterogeneous architectures have trended toward tighter integration and shared
memory largely due to the efficient communication and programmability enabled by this …

Selective GPU caches to eliminate CPU-GPU HW cache coherence

N Agarwal, D Nellans, E Ebrahimi… - … Symposium on High …, 2016 - ieeexplore.ieee.org
Cache coherence is ubiquitous in shared memory multiprocessors because it provides a
simple, high performance memory abstraction to programmers. Recent work suggests …

Chasing away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems

MD Sinclair, J Alsop, SV Adve - Proceedings of the 44th Annual …, 2017 - dl.acm.org
An unambiguous and easy-to-understand memory consistency model is crucial for ensuring
correct synchronization and guiding future design of heterogeneous systems. In a widely …

Lazy release consistency for GPUs

J Alsop, MS Orr, BM Beckmann… - 2016 49th Annual IEEE …, 2016 - ieeexplore.ieee.org
The heterogeneous-race-free (HRF) memory model has been embraced by the
Heterogeneous System Architecture (HSA) Foundation and OpenCL TM because it clearly …

Coherence domain restriction on large scale systems

Y Fu, TM Nguyen, D Wentzlaff - … of the 48th International Symposium on …, 2015 - dl.acm.org
Designing massive scale cache coherence systems has been an elusive goal. Whether it be
on large-scale GPUs, future thousand-core chips, or across million-core warehouse scale …

Mozart: Taming taxes and composing accelerators with shared-memory

V Suresh, B Mishra, Y **g, Z Zhu, N **… - Proceedings of the …, 2024 - dl.acm.org
Resource-constrained system-on-chips (SoCs) are increasingly heterogeneous with
specialized accelerators for various tasks. Acceleration taxes due to control and data …

Racer: TSO consistency via race detection

A Ros, S Kaxiras - 2016 49th Annual IEEE/ACM International …, 2016 - ieeexplore.ieee.org
Several recent efforts aim to simplify coherence and its associate costs (eg, directory size,
complexity) in multicores. The bulk of these efforts rely on program data-race-free (DRF) …

Callback: Efficient synchronization without invalidation with a directory just for spin-waiting

A Ros, S Kaxiras - Proceedings of the 42Nd Annual International …, 2015 - dl.acm.org
Cache coherence protocols based on self-invalidation allow a simpler design compared to
traditional invalidation-based protocols, by relying on data-race-free (DRF) semantics and …