A survey of recent prefetching techniques for processor caches
S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …
importance of latency hiding techniques such as prefetching grows further. However, naively …
DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
ing performance, scalability, and energy efficiency in modern systems. Computer systems …
Pythia: A customizable hardware prefetching framework using online reinforcement learning
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …
on exploiting one specific type of program context information (eg, program counter …
Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation
Pointer chasing is a fundamental operation, used by many important data-intensive
applications (eg, databases, key-value stores, graph processing workloads) to traverse …
applications (eg, databases, key-value stores, graph processing workloads) to traverse …
Multithreaded processors
T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com
The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …
have shown the limits of processor utilization even for today's superscalar microprocessors …
IMP: Indirect memory prefetcher
Machine learning, graph analytics and sparse linear algebra-based applications are
dominated by irregular memory accesses resulting from following edges in a graph or non …
dominated by irregular memory accesses resulting from following edges in a graph or non …
[BUCH][B] Mikrocontroller und Mikroprozessoren
U Brinkschulte, T Ungerer - 2010 - Springer
Mikrocontroller und Mikroprozessoren | SpringerLink Skip to main content Advertisement
SpringerLink Account Menu Find a journal Publish with us Track your research Search Cart …
SpringerLink Account Menu Find a journal Publish with us Track your research Search Cart …
Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
CK Luk - Proceedings of the 28th annual international …, 2001 - dl.acm.org
Hardly predictable data addresses in many irregular applications have rendered prefetching
ineffective. In many cases, the only accurate way to predict these addresses is to directly …
ineffective. In many cases, the only accurate way to predict these addresses is to directly …
Microarchitecture optimizations for exploiting memory-level parallelism
Y Chou, B Fahs, S Abraham - ACM SIGARCH Computer Architecture …, 2004 - dl.acm.org
The performance of memory-bound commercial applicationssuch as databases is limited by
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …