A survey of recent prefetching techniques for processor caches

S Mittal - ACM Computing Surveys (CSUR), 2016‏ - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

[HTML][HTML] A survey of cache bypassing techniques

S Mittal - Journal of Low Power Electronics and Applications, 2016‏ - mdpi.com
With increasing core-count, the cache demand of modern processors has also increased.
However, due to strict area/power budgets and presence of poor data-locality workloads …

Density tradeoffs of non-volatile memory as a replacement for SRAM based last level cache

K Korgaonkar, I Bhati, H Liu, J Gaur… - 2018 ACM/IEEE 45th …, 2018‏ - ieeexplore.ieee.org
Increasing the capacity of the Last Level Cache (LLC) can help scale the memory wall. Due
to prohibitive area and leakage power, however, growing conventional SRAM LLC already …

Stream-based memory access specialization for general purpose processors

Z Wang, T Nowatzki - Proceedings of the 46th International Symposium …, 2019‏ - dl.acm.org
Because of severe limitations in technology scaling, architects have innovated in
specializing general purpose processors for computation primitives (eg vector instructions …

Stream floating: Enabling proactive and decentralized cache optimizations

Z Wang, J Weng, J Lowe-Power, J Gaur… - … Symposium on High …, 2021‏ - ieeexplore.ieee.org
As multicore systems continue to grow in scale and on-chip memory capacity, the on-chip
network bandwidth and latency become problematic bottlenecks. Because of this …

Criticality aware tiered cache hierarchy: A fundamental relook at multi-level cache hierarchies

AV Nori, J Gaur, S Rai, S Subramoney… - 2018 ACM/IEEE 45th …, 2018‏ - ieeexplore.ieee.org
On-die caches are a popular method to help hide the main memory latency. However, it is
difficult to build large caches without substantially increasing their access latency, which in …

The reuse cache: Downsizing the shared last-level cache

J Albericio, P Ibáñez, V Viñals, JM Llabería - Proceedings of the 46th …, 2013‏ - dl.acm.org
Over recent years, a growing body of research has shown that a considerable portion of the
shared last-level cache (SLLC) is dead, meaning that the corresponding cache lines are …

Gmt: Gpu orchestrated memory tiering for the big data era

CH Chang, J Han, A Sivasubramaniam… - Proceedings of the 29th …, 2024‏ - dl.acm.org
As the demand for processing larger datasets increases, GPUs need to reach deeper into
their (memory) hierarchy to directly access capacities that only storage systems (SSDs) can …

Register file prefetching

S Shukla, S Bandishte, J Gaur… - Proceedings of the 49th …, 2022‏ - dl.acm.org
The memory wall continues to limit the performance of modern out-of-order (OOO)
processors, despite the expensive provisioning of large multi-level caches and …

Base-victim compression: An opportunistic cache compression architecture

J Gaur, AR Alameldeen, S Subramoney - ACM SIGARCH Computer …, 2016‏ - dl.acm.org
The memory wall has motivated many enhancements to cache management policies aimed
at reducing misses. Cache compression has been proposed to increase effective cache …