There's plenty of room at the Top: What will drive computer performance after Moore's law?

CE Leiserson, NC Thompson, JS Emer, BC Kuszmaul… - Science, 2020 - science.org
BACKGROUND Improvements in computing power can claim a large share of the credit for
many of the things that we take for granted in our modern lives: cellphones that are more …

GROMACS 3.0: a package for molecular simulation and trajectory analysis

E Lindahl, B Hess, D Van Der Spoel - Molecular modeling annual, 2001 - Springer
GROMACS 3.0 is the latest release of a versatile and very well optimized package for
molecular simulation. Much effort has been devoted to achieving extremely high …

The design and implementation of FFTW3

M Frigo, SG Johnson - Proceedings of the IEEE, 2005 - ieeexplore.ieee.org
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the
hardware in order to maximize performance. This paper shows that such an approach can …

Gossip-based computation of aggregate information

D Kempe, A Dobra, J Gehrke - 44th Annual IEEE Symposium …, 2003 - ieeexplore.ieee.org
Over the last decade, we have seen a revolution in connectivity between computers, and a
resulting paradigm shift from centralized to highly distributed systems. With massive scale …

Automated empirical optimizations of software and the ATLAS project

RC Whaley, A Petitet, JJ Dongarra - Parallel computing, 2001 - Elsevier
This paper describes the automatically tuned linear algebra software (ATLAS) project, as
well as the fundamental principles that underly it. ATLAS is an instantiation of a new …

A survey on smartphone-based systems for opportunistic user context recognition

SA Hoseini-Tabatabaei, A Gluhak… - ACM Computing Surveys …, 2013 - dl.acm.org
The ever-growing computation and storage capability of mobile phones have given rise to
mobile-centric context recognition systems, which are able to sense and analyze the context …

Cache-oblivious algorithms

M Frigo, CE Leiserson, H Prokop… - … on Foundations of …, 1999 - ieeexplore.ieee.org
This paper presents asymptotically optimal algorithms for rectangular matrix transpose, FFT,
and sorting on computers with multiple levels of caching. Unlike previous optimal algorithms …

SPIRAL: Code generation for DSP transforms

M Puschel, JMF Moura, JR Johnson… - Proceedings of the …, 2005 - ieeexplore.ieee.org
Fast changing, increasingly complex, and diverse computing platforms pose central
problems in scientific computing: How to achieve, with reasonable effort, portable optimal …

Spatially-stationary model for holographic MIMO small-scale fading

A Pizzo, TL Marzetta… - IEEE Journal on Selected …, 2020 - ieeexplore.ieee.org
Imagine an array with a massive (possibly uncountably infinite) number of antennas in a
compact space. We refer to a system of this sort as Holographic MIMO. Given the impressive …

Auto-tuning a high-level language targeted to GPU codes

S Grauer-Gray, L Xu, R Searles… - 2012 innovative …, 2012 - ieeexplore.ieee.org
Determining the best set of optimizations to apply to a kernel to be executed on the graphics
processing unit (GPU) is a challenging problem. There are large sets of possible …