Cache-oblivious algorithms
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT,
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
[BOOK][B] Space-filling curves: an introduction with applications in scientific computing
M Bader - 2012 - books.google.com
The present book provides an introduction to using space-filling curves (SFC) as tools in
scientific computing. Special focus is laid on the representation of SFC and on resulting …
scientific computing. Special focus is laid on the representation of SFC and on resulting …
Tiling optimizations for 3D scientific computations
G Rivera, CW Tseng - SC'00: Proceedings of the 2000 ACM …, 2000 - ieeexplore.ieee.org
Compiler transformations can significantly improve data locality for many scientific programs.
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …
In this paper, we show iterative solvers for partial differential equations (PDEs) in three …
Cache-oblivious algorithms
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast
Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike …
Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike …
Recursive array layouts and fast parallel matrix multiplication
Matrix multiplication is an important kernel in linear algebra algorithms, and the performance
of both serial and parallel implementations is highly dependent on the memory system …
of both serial and parallel implementations is highly dependent on the memory system …
Optimizing graph algorithms for improved cache performance
We develop algorithmic optimizations to improve the cache performance of four fundamental
graph algorithms. We present a cache-oblivious implementation of the Floyd-Warshall …
graph algorithms. We present a cache-oblivious implementation of the Floyd-Warshall …
Exact analysis of the cache behavior of nested loops
We develop from first principles an exact model of the behavior of loop nests executing in a
memory hicrarchy, by using a nontraditional classification of misses that has the key property …
memory hicrarchy, by using a nontraditional classification of misses that has the key property …
[PDF][PDF] Cache oblivious search trees via binary trees of small height
We propose a version of cache oblivious search trees which is simpler than the previous
proposal of Bender, Demaine and Farach-Colton and has the same complexity bounds. In …
proposal of Bender, Demaine and Farach-Colton and has the same complexity bounds. In …
Data cache locking for higher program predictability
Caches have become increasingly important with the widening gap between main memory
and processor speeds. However, they are a source of unpredictability due to their …
and processor speeds. However, they are a source of unpredictability due to their …
Memory coloring: A compiler approach for scratchpad memory management
Scratchpad memory (SPM), a fast software-managed on-chip SRAM, is now widely used in
modern embedded processors. Compared to hardware-managed cache, it is more efficient …
modern embedded processors. Compared to hardware-managed cache, it is more efficient …