[BOOK][B] Memory systems: cache, DRAM, disk

B Jacob, D Wang, S Ng - 2010 - books.google.com
Is your memory hierarchy stop** your microprocessor from performing at the high level it
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …

A Comprehensive Survey of Benchmarks for Improvement of Software's Non-Functional Properties

A Blot, J Petke - ACM Computing Surveys, 2025 - dl.acm.org
Despite recent increase in research on improvement of non-functional properties of
software, such as energy usage or program size, there is a lack of standard benchmarks for …

I-spy: Context-driven conditional instruction prefetching with coalescing

TA Khan, A Sriraman, J Devietti… - 2020 53rd Annual …, 2020 - ieeexplore.ieee.org
Modern data center applications have rapidly expanding instruction footprints that lead to
frequent instruction cache misses, increasing cost and degrading data center performance …

Proactive instruction fetch

M Ferdman, C Kaynak, B Falsafi - Proceedings of the 44th Annual IEEE …, 2011 - dl.acm.org
Fast access requirements preclude building L1 instruction caches large enough to capture
the working set of server workloads. Efforts exist to mitigate limited L1 instruction cache …

Twig: Profile-guided btb prefetching for data center applications

TA Khan, N Brown, A Sriraman… - MICRO-54: 54th Annual …, 2021 - dl.acm.org
Modern data center applications have deep software stacks, with instruction footprints that
are orders of magnitude larger than typical instruction cache (I-cache) sizes. To efficiently …

Propeller: A profile guided, relinking optimizer for warehouse-scale applications

H Shen, K Pszeniczny, R Lavaee, S Kumar… - Proceedings of the 28th …, 2023 - dl.acm.org
While profile guided optimizations (PGO) and link time optimiza-tions (LTO) have been
widely adopted, post link optimizations (PLO) have languished until recently when …

Boomerang: A metadata-free architecture for control flow delivery

R Kumar, CC Huang, B Grot… - 2017 IEEE International …, 2017 - ieeexplore.ieee.org
Contemporary server workloads feature massive instruction footprints stemming from deep,
layered software stacks. The active instruction working set of the entire stack can easily …

Thermometer: profile-guided btb replacement for data center applications

S Song, TA Khan, SM Shahri, A Sriraman… - Proceedings of the 49th …, 2022 - dl.acm.org
Modern processors employ a decoupled frontend with Fetch Directed Instruction Prefetching
(FDIP) to avoid frontend stalls in data center applications. However, the large branch …

Temporal instruction fetch streaming

M Ferdman, TF Wenisch, A Ailamaki… - 2008 41st IEEE/ACM …, 2008 - ieeexplore.ieee.org
L1 instruction-cache misses pose a critical performance bottleneck in commercial server
workloads. Cache access latency constraints preclude L1 instruction caches large enough …

RDIP: Return-address-stack directed instruction prefetching

A Kolli, A Saidi, TF Wenisch - Proceedings of the 46th Annual IEEE/ACM …, 2013 - dl.acm.org
L1 instruction fetch misses remain a critical performance bottleneck, accounting for up to
40% slowdowns in server applications. Whereas instruction footprints typically fit within last …