A survey of recent prefetching techniques for processor caches

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

Performance cloning: A technique for disseminating proprietary applications as benchmarks

A Joshi, L Eeckhout, RH Bell… - 2006 IEEE International …, 2006 - ieeexplore.ieee.org
Many embedded real world applications are intellectual property, and vendors hesitate to
share these proprietary applications with computer architects and designers. This poses a …

Fast thread migration via cache working set prediction

JA Brown, L Porter, DM Tullsen - 2011 IEEE 17th International …, 2011 - ieeexplore.ieee.org
The most significant source of lost performance when a thread migrates between cores is
the loss of cache state. A significant boost in post-migration performance is possible if the …

Prefetch injection based on hardware monitoring and object metadata

AR Adl-Tabatabai, RL Hudson, MJ Serrano… - ACM SIGPLAN …, 2004 - dl.acm.org
Cache miss stalls hurt performance because of the large gap between memory and
processor speeds-for example, the popular server benchmark SPEC JBB2000 spends 45 …

Distilling the essence of proprietary workloads into miniature benchmarks

A Joshi, L Eeckhout, RH Bell Jr, LK John - ACM Transactions on …, 2008 - dl.acm.org
Benchmarks set standards for innovation in computer architecture research and industry
product development. Consequently, it is of paramount importance that these workloads are …

On the limits of leakage power reduction in caches

Y Meng, T Sherwood, R Kastner - … International Symposium on …, 2005 - ieeexplore.ieee.org
If current technology scaling trends hold, leakage power dissipation soon becomes the
dominant source of power consumption. Caches, due to the fact that they account for the …

A decoupled predictor-directed stream prefetching architecture

S Sair, T Sherwood, B Calder - IEEE Transactions on …, 2003 - ieeexplore.ieee.org
An effective method for reducing the effect of load latency in modern processors is data
prefetching. One form of hardware-based data prefetching, stream buffers, has been shown …

Scheduling for energy efficiency and fault tolerance in hard real-time systems

Y Liu, H Liang, K Wu - 2010 Design, Automation & Test in …, 2010 - ieeexplore.ieee.org
This paper studies the dilemma between fault tolerance and energy efficiency in frame-
based real-time systems. Given a set of K tasks to be executed on a system that supports L …

[BOOK][B] Constructing adaptable and scalable synthetic benchmarks for microprocessor performance evaluation

AM Joshi - 2007 - search.proquest.com
Benchmarks set standards for innovation in computer architecture research and industry
product development. Consequently, it is of paramount importance that the benchmarks …

Data prefetching in a cache hierarchy with high bandwidth and capacity

LM Ramos, JL Briz, PE Ibáñez, V Viñals - ACM SIGARCH Computer …, 2007 - dl.acm.org
In this paper we evaluate four hardware data prefetchers in the context of a high-
performance three-level on chip cache hierarchy with high bandwidth and capacity. We …