Cache-oblivious algorithms

M Frigo, CE Leiserson, H Prokop… - … on Foundations of …, 1999 - ieeexplore.ieee.org
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT,
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …

Recursive blocked algorithms and hybrid data structures for dense matrix library software

E Elmroth, F Gustavson, I Jonsson, B Kågström - SIAM review, 2004 - SIAM
Matrix computations are both fundamental and ubiquitous in computational science and its
vast application areas. Along with the development of more advanced computer systems …

Faster all-pairs shortest paths via circuit complexity

R Williams - Proceedings of the forty-sixth annual ACM symposium …, 2014 - dl.acm.org
We present a new randomized method for computing the min-plus product (aka, tropical
product) of two n× n matrices, yielding a faster algorithm for solving the all-pairs shortest …

Sparse matrix multiplication: The distributed block-compressed sparse row library

U Borštnik, J VandeVondele, V Weber, J Hutter - Parallel Computing, 2014 - Elsevier
Efficient parallel multiplication of sparse matrices is key to enabling many large-scale
calculations. This article presents the DBCSR (Distributed Block Compressed Sparse Row) …

Terrain simplification simplified: A general framework for view-dependent out-of-core visualization

P Lindstrom, V Pascucci - IEEE Transactions on Visualization …, 2002 - ieeexplore.ieee.org
We describe a general framework for out-of-core rendering and management of massive
terrain surfaces. The two key components of this framework are: view-dependent refinement …

LU-GPU: Efficient algorithms for solving dense linear systems on graphics hardware

N Galoppo, NK Govindaraju, M Henson… - SC'05: Proceedings …, 2005 - ieeexplore.ieee.org
We present a novel algorithm to solve dense linear systems using graphics processors
(GPUs). We reduce matrix decomposition and row operations to a series of rasterization …

Cache-oblivious algorithms

M Frigo, CE Leiserson, H Prokop… - ACM Transactions on …, 2012 - dl.acm.org
This article presents asymptotically optimal algorithms for rectangular matrix transpose, fast
Fourier transform (FFT), and sorting on computers with multiple levels of caching. Unlike …

Global static indexing for real-time exploration of very large regular grids

V Pascucci, RJ Frank - Proceedings of the 2001 ACM/IEEE Conference …, 2001 - dl.acm.org
In this paper we introduce a new indexing scheme for progressive traversal and
visualization of large regular grids. We demonstrate the potential of our approach by …

An extension of the StarSs programming model for platforms with multiple GPUs

E Ayguadé, RM Badia, FD Igual, J Labarta… - Euro-Par 2009 Parallel …, 2009 - Springer
While general-purpose homogeneous multi-core architectures are becoming ubiquitous,
there are clear indications that, for a number of important applications, a better …

Exact analysis of the cache behavior of nested loops

S Chatterjee, E Parker, PJ Hanlon, AR Lebeck - ACM SIGPLAN Notices, 2001 - dl.acm.org
We develop from first principles an exact model of the behavior of loop nests executing in a
memory hicrarchy, by using a nontraditional classification of misses that has the key property …