Moesi-prime: preventing coherence-induced hammering in commodity workloads

K Loughlin, S Saroiu, A Wolman, YA Manerkar… - Proceedings of the 49th …, 2022 - dl.acm.org
Prior work shows that Rowhammer attacks---which flip bits in DRAM via frequent activations
of the same row (s)---are viable. Adversaries typically mount these attacks via instruction …

A communication characterisation of splash-2 and parsec

N Barrow-Williams, C Fensch… - 2009 IEEE international …, 2009 - ieeexplore.ieee.org
Recent benchmark suite releases such as Parsec specifically utilise the tightly coupled
cores available in chip-multiprocessors to allow the use of newer, high performance, models …

Stream floating: Enabling proactive and decentralized cache optimizations

Z Wang, J Weng, J Lowe-Power, J Gaur… - … Symposium on High …, 2021 - ieeexplore.ieee.org
As multicore systems continue to grow in scale and on-chip memory capacity, the on-chip
network bandwidth and latency become problematic bottlenecks. Because of this …

POPS: Coherence protocol optimization for both private and shared data

H Hossain, S Dwarkadas… - … Conference on Parallel …, 2011 - ieeexplore.ieee.org
As the number of cores in a chip multiprocessor (CMP) increases, the need for larger on-
chip caches also increases in order to avoid creating a bottleneck at the off-chip …

Maximum multicore power (mampo) an automatic multithreaded synthetic power virus generation framework for multicore systems

K Ganesan, LK John - Proceedings of 2011 International Conference for …, 2011 - dl.acm.org
The practically attainable worst case power consumption for a computer system is a
significant design parameter and it is a very tedious process to determine it by manually …

A direct coherence protocol for many-core chip multiprocessors

A Ros, ME Acacio, JM Garcia - IEEE Transactions on Parallel …, 2010 - ieeexplore.ieee.org
Future many-core CMP designs that will integrate tens of processor cores on-chip will be
constrained by area and power. Area constraints make impractical the use of a bus or a …

Automatic generation of miniaturized synthetic proxies for target applications to efficiently design multicore processors

K Ganesan, LK John - IEEE Transactions on Computers, 2013 - ieeexplore.ieee.org
Prohibitive simulation time with pre-silicon design models and unavailability of proprietary
target applications make microprocessor design very tedious. The framework proposed in …

DiCo-CMP: Efficient cache coherency in tiled CMP architectures

A Ros, ME Acacio, JM García - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Future CMP designs that will integrate tens of processor cores on-chip will be constrained by
area and power. Area constraints make impractical the use of a bus or a crossbar as the on …

Token tenure: PATCHing token counting using directory-based cache coherence

A Raghavan, C Blundell… - 2008 41st IEEE/ACM …, 2008 - ieeexplore.ieee.org
Traditional coherence protocols present a set of difficult tradeoffs: the reliance of snoopy
protocols on broadcast and ordered interconnects limits their scalability, while directory …

Adaptive cache coherence mechanisms with producer–consumer sharing optimization for chip multiprocessors

A Kayi, O Serres, T El-Ghazawi - IEEE Transactions on …, 2013 - ieeexplore.ieee.org
In chip multiprocessors (CMPs), maintaining cache coherence can account for a major
performance overhead. Write-invalidate protocols adapted by most CMPs generate high …