A survey of recent prefetching techniques for processor caches
S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …
importance of latency hiding techniques such as prefetching grows further. However, naively …
Pythia: A customizable hardware prefetching framework using online reinforcement learning
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …
on exploiting one specific type of program context information (eg, program counter …
Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation
Pointer chasing is a fundamental operation, used by many important data-intensive
applications (eg, databases, key-value stores, graph processing workloads) to traverse …
applications (eg, databases, key-value stores, graph processing workloads) to traverse …
High performance RDMA-based MPI implementation over InfiniBand
J Liu, J Wu, SP Kini, P Wyckoff, DK Panda - Proceedings of the 17th …, 2003 - dl.acm.org
Although InfiniBand Architecture is relatively new in the high performance computing area, it
offers many features which help us to improve the performance of communication …
offers many features which help us to improve the performance of communication …
Runahead execution: An alternative to very large instruction windows for out-of-order processors
Today's high performance processors tolerate long latency operations by means of out-of-
order execution. However, as latencies increase, the size of the instruction window must …
order execution. However, as latencies increase, the size of the instruction window must …
IMP: Indirect memory prefetcher
Machine learning, graph analytics and sparse linear algebra-based applications are
dominated by irregular memory accesses resulting from following edges in a graph or non …
dominated by irregular memory accesses resulting from following edges in a graph or non …
Multithreaded processors
T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com
The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …
have shown the limits of processor utilization even for today's superscalar microprocessors …
Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
When prefetching works, when it doesn't, and why
In emerging and future high-end processor systems, tolerating increasing cache miss
latency and properly managing memory bandwidth will be critical to achieving high …
latency and properly managing memory bandwidth will be critical to achieving high …
Microarchitecture optimizations for exploiting memory-level parallelism
Y Chou, B Fahs, S Abraham - ACM SIGARCH Computer Architecture …, 2004 - dl.acm.org
The performance of memory-bound commercial applicationssuch as databases is limited by
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …