There's plenty of room at the Top: What will drive computer performance after Moore's law?
BACKGROUND Improvements in computing power can claim a large share of the credit for
many of the things that we take for granted in our modern lives: cellphones that are more …
many of the things that we take for granted in our modern lives: cellphones that are more …
GROMACS 3.0: a package for molecular simulation and trajectory analysis
GROMACS 3.0 is the latest release of a versatile and very well optimized package for
molecular simulation. Much effort has been devoted to achieving extremely high …
molecular simulation. Much effort has been devoted to achieving extremely high …
The design and implementation of FFTW3
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the
hardware in order to maximize performance. This paper shows that such an approach can …
hardware in order to maximize performance. This paper shows that such an approach can …
Gossip-based computation of aggregate information
Over the last decade, we have seen a revolution in connectivity between computers, and a
resulting paradigm shift from centralized to highly distributed systems. With massive scale …
resulting paradigm shift from centralized to highly distributed systems. With massive scale …
Automated empirical optimizations of software and the ATLAS project
This paper describes the automatically tuned linear algebra software (ATLAS) project, as
well as the fundamental principles that underly it. ATLAS is an instantiation of a new …
well as the fundamental principles that underly it. ATLAS is an instantiation of a new …
A survey on smartphone-based systems for opportunistic user context recognition
The ever-growing computation and storage capability of mobile phones have given rise to
mobile-centric context recognition systems, which are able to sense and analyze the context …
mobile-centric context recognition systems, which are able to sense and analyze the context …
Cache-oblivious algorithms
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT,
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …
SPIRAL: Code generation for DSP transforms
Fast changing, increasingly complex, and diverse computing platforms pose central
problems in scientific computing: How to achieve, with reasonable effort, portable optimal …
problems in scientific computing: How to achieve, with reasonable effort, portable optimal …
Spatially-stationary model for holographic MIMO small-scale fading
Imagine an array with a massive (possibly uncountably infinite) number of antennas in a
compact space. We refer to a system of this sort as Holographic MIMO. Given the impressive …
compact space. We refer to a system of this sort as Holographic MIMO. Given the impressive …
Auto-tuning a high-level language targeted to GPU codes
Determining the best set of optimizations to apply to a kernel to be executed on the graphics
processing unit (GPU) is a challenging problem. There are large sets of possible …
processing unit (GPU) is a challenging problem. There are large sets of possible …