An overview of cache optimization techniques and cache-aware numerical algorithms
M Kowarschik, C Weiß - Algorithms for memory hierarchies: advanced …, 2003 - Springer
In order to mitigate the impact of the growing gap between CPU speed and main memory
performance, today's computer architectures implement hierarchical memory structures. The …
performance, today's computer architectures implement hierarchical memory structures. The …
[書籍][B] Parallel computer architecture: a hardware/software approach
The most exciting development in parallel computer architecture is the convergence of
traditionally disparate approaches on a common machine structure. This book explains the …
traditionally disparate approaches on a common machine structure. This book explains the …
[書籍][B] Memory systems: cache, DRAM, disk
B Jacob, D Wang, S Ng - 2010 - books.google.com
Is your memory hierarchy stop** your microprocessor from performing at the high level it
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …
should be? Memory Systems: Cache, DRAM, Disk shows you how to resolve this problem …
[PDF][PDF] Database architecture optimized for the new bottleneck: Memory access
In the past decade, advances in speed of commodity CPUs have far out-paced advances in
memory latency. Main-memory access is therefore increasingly a performance bottleneck for …
memory latency. Main-memory access is therefore increasingly a performance bottleneck for …
Compiler-based prefetching for recursive data structures
Software-controlled data prefetching offers the potential for bridging the ever-increasing
speed gap between the memory subsystem and today's high-performance processors …
speed gap between the memory subsystem and today's high-performance processors …
Meta optimization: Improving compiler heuristics with machine learning
M Stephenson, S Amarasinghe, M Martin… - ACM sigplan …, 2003 - dl.acm.org
Compiler writers have crafted many heuristics over the years to approximately solve NP-
hard problems efficiently. Finding a heuristic that performs well on a broad range of …
hard problems efficiently. Finding a heuristic that performs well on a broad range of …
Improving hash join performance through prefetching
Hash join algorithms suffer from extensive CPU cache stalls. This article shows that the
standard hash join algorithm for disk-oriented databases (ie GRACE) spends over 80% of its …
standard hash join algorithm for disk-oriented databases (ie GRACE) spends over 80% of its …
Optimizing main-memory join on modern hardware
In the past decade, the exponential growth in commodity CPU's speed has far outpaced
advances in memory latency. A second trend is that CPU performance advances are not …
advances in memory latency. A second trend is that CPU performance advances are not …
Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors
CK Luk - Proceedings of the 28th annual international …, 2001 - dl.acm.org
Hardly predictable data addresses in many irregular applications have rendered prefetching
ineffective. In many cases, the only accurate way to predict these addresses is to directly …
ineffective. In many cases, the only accurate way to predict these addresses is to directly …
Improving the memory-system performance of sparse-matrix vector multiplication
S Toledo - IBM Journal of research and development, 1997 - ieeexplore.ieee.org
Sparse-matrix vector multiplication is an important kernel that often runs inefficiently on
superscalar RISC processors. This paper describes techniques that increase instruction …
superscalar RISC processors. This paper describes techniques that increase instruction …