An overview of cache optimization techniques and cache-aware numerical algorithms

M Kowarschik, C Weiß - Algorithms for memory hierarchies: advanced …, 2003 - Springer
In order to mitigate the impact of the growing gap between CPU speed and main memory
performance, today's computer architectures implement hierarchical memory structures. The …

Gossip-based computation of aggregate information

D Kempe, A Dobra, J Gehrke - 44th Annual IEEE Symposium …, 2003 - ieeexplore.ieee.org
Over the last decade, we have seen a revolution in connectivity between computers, and a
resulting paradigm shift from centralized to highly distributed systems. With massive scale …

Tile size selection using cache organization and data layout

S Coleman, KS McKinley - ACM SIGPLAN Notices, 1995 - dl.acm.org
When dense matrix computations are too large to fit in cache, previous research proposes
tiling to reduce or eliminate capacity misses. This paper presents a new algorithm for …

Impulse: Building a smarter memory controller

J Carter, W Hsieh, L Stoller, M Swanson… - … Symposium on High …, 1999 - ieeexplore.ieee.org
Impulse is a new memory system architecture that adds two important features to a
traditional memory controller. First, Impulse supports application-specific optimizations …

[PDF][PDF] CHiLL: A framework for composing high-level loop transformations

C Chen, J Chame, M Hall - 2008 - Citeseer
This paper describes a general and robust loop transformation framework that enables
compilers to generate efficient code on complex loop nests. Despite two decades of prior …

A data cache with multiple caching strategies tuned to different types of locality

A González, C Aliagas, M Valero - ACM International Conference on …, 1995 - dl.acm.org
Current data cache organizations fail to deliver high performance in scalar processors for
many vector applications. There are two main reasons for this loss of performance: the use …

Cache miss equations: a compiler framework for analyzing and tuning memory behavior

S Ghosh, M Martonosi, S Malik - ACM Transactions on Programming …, 1999 - dl.acm.org
With the ever-widening performance gap between processors and main memory, cache
memory, which is used to bridge this gap, is becoming more and more significant. Caches …

Tiling optimizations for 3D scientific computations

G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …

[PDF][PDF] Reuse distance as a metric for cache behavior

K Beyls, E D'Hollander - Proceedings of the IASTED Conference on …, 2001 - Citeseer
The widening gap between memory and processor speed causes more and more programs
to shift from CPU-bounded to memory speed-bounded, even in the presence of multi-level …

Data and computation transformations for multiprocessors

JM Anderson, SP Amarasinghe, MS Lam - ACM SIGPLAN Notices, 1995 - dl.acm.org
Effective memory hierarchy utilization is critical to the performance of modern multiprocessor
architectures. We have developed the first compiler system that fully automatically …