Pythia: A customizable hardware prefetching framework using online reinforcement learning
Past research has proposed numerous hardware prefetching techniques, most of which rely
on exploiting one specific type of program context information (eg, program counter …
on exploiting one specific type of program context information (eg, program counter …
The championship simulator: Architectural simulation for education and competition
Recent years have seen a dramatic increase in the microarchitectural complexity of
processors. This increase in complexity presents a twofold challenge for the field of …
processors. This increase in complexity presents a twofold challenge for the field of …
Decoupled vector runahead
We present Decoupled Vector Runahead (DVR), an in-core prefetching technique,
executing separately to the main application thread, that exploits massive amounts of …
executing separately to the main application thread, that exploits massive amounts of …
AfterImage: Leaking control flow data and tracking load operations via the hardware prefetcher
Research into processor-based side-channels has seen both a large number and a large
variety of disclosed vulnerabilities that can leak critical, private data to malicious attackers …
variety of disclosed vulnerabilities that can leak critical, private data to malicious attackers …
Hermes: Accelerating long-latency load requests via perceptron-based off-chip load prediction
Long-latency load requests continue to limit the performance of modern high-performance
processors. To increase the latency tolerance of a processor, architects have primarily relied …
processors. To increase the latency tolerance of a processor, architects have primarily relied …
Clip: Load criticality based data prefetching for bandwidth-constrained many-core systems
B Panda - Proceedings of the 56th Annual IEEE/ACM …, 2023 - dl.acm.org
Hardware prefetching is a latency-hiding technique that hides the costly off-chip DRAM
accesses. However, state-of-the-art prefetchers fail to deliver performance improvement in …
accesses. However, state-of-the-art prefetchers fail to deliver performance improvement in …
Effective mimicry of belady's min policy
The past decade has seen the rise of highly successful cache replacement policies that are
based on binary prediction. For example, the Hawkeye policy learns whether lines loaded …
based on binary prediction. For example, the Hawkeye policy learns whether lines loaded …
Micro-armed bandit: lightweight & reusable reinforcement learning for microarchitecture decision-making
Online Reinforcement Learning (RL) has been adopted as an effective mechanism in
various decision-making problems in microarchitecture. Its high adaptability and the ability to …
various decision-making problems in microarchitecture. Its high adaptability and the ability to …
Berti: an accurate local-delta data prefetcher
Data prefetching is a technique that plays a crucial role in modern high-performance
processors by hiding long latency memory accesses. Several state-of-the-art hardware …
processors by hiding long latency memory accesses. Several state-of-the-art hardware …
Snake: A variable-length chain-based prefetching for gpus
Graphics Processing Units (GPUs) utilize memory hierarchy and Thread-Level Parallelism
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …
(TLP) to tolerate off-chip memory latency, which is a significant bottleneck for memory-bound …