A survey of recent prefetching techniques for processor caches

S Mittal - ACM Computing Surveys (CSUR), 2016 - dl.acm.org
As the trends of process scaling make memory systems an even more crucial bottleneck, the
importance of latency hiding techniques such as prefetching grows further. However, naively …

DAMOV: A new methodology and benchmark suite for evaluating data movement bottlenecks

GF Oliveira, J Gómez-Luna, L Orosa, S Ghose… - IEEE …, 2021 - ieeexplore.ieee.org
Data movement between the CPU and main memory is a first-order obstacle against improv
ing performance, scalability, and energy efficiency in modern systems. Computer systems …

Pythia: A customizable hardware prefetching framework using online reinforcement learning

R Bera, K Kanellopoulos, A Nori, T Shahroodi… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …

Accelerating pointer chasing in 3D-stacked memory: Challenges, mechanisms, evaluation

K Hsieh, S Khan, N Vijaykumar… - 2016 IEEE 34th …, 2016 - ieeexplore.ieee.org
Pointer chasing is a fundamental operation, used by many important data-intensive
applications (eg, databases, key-value stores, graph processing workloads) to traverse …

Multithreaded processors

T Ungerer, B Robič, J Šilc - The Computer Journal, 2002 - academic.oup.com
The instruction-level parallelism found in a conventional instruction stream is limited. Studies
have shown the limits of processor utilization even for today's superscalar microprocessors …

IMP: Indirect memory prefetcher

X Yu, CJ Hughes, N Satish, S Devadas - Proceedings of the 48th …, 2015 - dl.acm.org
Machine learning, graph analytics and sparse linear algebra-based applications are
dominated by irregular memory accesses resulting from following edges in a graph or non …

[BUCH][B] Mikrocontroller und Mikroprozessoren

U Brinkschulte, T Ungerer - 2010 - Springer
Mikrocontroller und Mikroprozessoren | SpringerLink Skip to main content Advertisement
SpringerLink Account Menu Find a journal Publish with us Track your research Search Cart …

Prodigy: Improving the memory latency of data-indirect irregular workloads using hardware-software co-design

N Talati, K May, A Behroozi, Y Yang… - … Symposium on High …, 2021 - ieeexplore.ieee.org
Irregular workloads are typically bottlenecked by the memory system. These workloads often
use sparse data representations, eg, compressed sparse row/column (CSR/CSC), to …

Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors

CK Luk - Proceedings of the 28th annual international …, 2001 - dl.acm.org
Hardly predictable data addresses in many irregular applications have rendered prefetching
ineffective. In many cases, the only accurate way to predict these addresses is to directly …

Microarchitecture optimizations for exploiting memory-level parallelism

Y Chou, B Fahs, S Abraham - ACM SIGARCH Computer Architecture …, 2004 - dl.acm.org
The performance of memory-bound commercial applicationssuch as databases is limited by
increasing memory latencies. Inthis paper, we show that exploiting memory-level parallelism …