Cache-oblivious algorithms
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT,
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
Recursive blocked algorithms and hybrid data structures for dense matrix library software
Matrix computations are both fundamental and ubiquitous in computational science and its
vast application areas. Along with the development of more advanced computer systems …
vast application areas. Along with the development of more advanced computer systems …
Faster all-pairs shortest paths via circuit complexity
R Williams - Proceedings of the forty-sixth annual ACM symposium …, 2014 - dl.acm.org
We present a new randomized method for computing the min-plus product (aka, tropical
product) of two n× n matrices, yielding a faster algorithm for solving the all-pairs shortest …
product) of two n× n matrices, yielding a faster algorithm for solving the all-pairs shortest …
Sparse matrix multiplication: The distributed block-compressed sparse row library
Efficient parallel multiplication of sparse matrices is key to enabling many large-scale
calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) …
calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) …
Terrain simplification simplified: A general framework for view-dependent out-of-core visualization
We describe a general framework for out-of-core rendering and management of massive
terrain surfaces. The two key components of this framework are: view-dependent refinement …
terrain surfaces. The two key components of this framework are: view-dependent refinement …
LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware
We present a novel algorithm to solve dense linear systems using graphics processors
(GPUs). We reduce matrix decomposition and row operations to a series of rasterization …
(GPUs). We reduce matrix decomposition and row operations to a series of rasterization …
Cache-oblivious algorithms
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast
Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike …
Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike …
Global static indexing for real-time exploration of very large regular grids
In this paper we introduce a new indexing scheme for progressive traversal and
visualization of large regular grids. We demonstrate the potential of our approach by …
visualization of large regular grids. We demonstrate the potential of our approach by …
An extension of the StarSs programming model for platforms with multiple GPUs
While general-purpose homogeneous multi-core architectures are becoming ubiquitous,
there are clear indications that, for a number of important applications, a better …
there are clear indications that, for a number of important applications, a better …
Exact analysis of the cache behavior of nested loops
We develop from first principles an exact model of the behavior of loop nests executing in a
memory hicrarchy, by using a nontraditional classification of misses that has the key property …
memory hicrarchy, by using a nontraditional classification of misses that has the key property …