A catalog of stream processing optimizations

M Hirzel, R Soulé, S Schneider, B Gedik… - ACM Computing Surveys …, 2014 - dl.acm.org
Various research communities have independently arrived at stream processing as a
programming model for efficient and parallel computing. These communities include digital …

Recursive blocked algorithms and hybrid data structures for dense matrix library software

E Elmroth, F Gustavson, I Jonsson, B Kågström - SIAM review, 2004 - SIAM
Matrix computations are both fundamental and ubiquitous in computational science and its
vast application areas. Along with the development of more advanced computer systems …

Scaling distributed machine learning with the parameter server

M Li, DG Andersen, JW Park, AJ Smola… - … USENIX Symposium on …, 2014 - usenix.org
We propose a parameter server framework for distributed machine learning problems. Both
data and workloads are distributed over worker nodes, while the server nodes maintain …

Ernest: Efficient performance prediction for {Large-Scale} advanced analytics

S Venkataraman, Z Yang, M Franklin, B Recht… - … USENIX Symposium on …, 2016 - usenix.org
Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …

[BOG][B] Accuracy and stability of numerical algorithms

NJ Higham - 2002 - SIAM
In the nearly seven years since I finished writing the first edition of this book research on the
accuracy and stability of numerical algorithms has continued to flourish and mature. Our …

[BOG][B] The method of moments in electromagnetics

WC Gibson - 2021 - taylorfrancis.com
The Method of Moments in Electromagnetics, Third Edition details the numerical solution of
electromagnetic integral equations via the Method of Moments (MoM). Previous editions …

Large-scale deep unsupervised learning using graphics processors

R Raina, A Madhavan, AY Ng - Proceedings of the 26th annual …, 2009 - dl.acm.org
The promise of unsupervised learning methods lies in their potential to use vast amounts of
unlabeled data to learn complex, highly nonlinear models with millions of free parameters …

Auto-tuning a high-level language targeted to GPU codes

S Grauer-Gray, L Xu, R Searles… - 2012 innovative …, 2012 - ieeexplore.ieee.org
Determining the best set of optimizations to apply to a kernel to be executed on the graphics
processing unit (GPU) is a challenging problem. There are large sets of possible …

The LINPACK benchmark: past, present and future

JJ Dongarra, P Luszczek… - … and Computation: practice …, 2003 - Wiley Online Library
This paper describes the LINPACK Benchmark and some of its variations commonly used to
assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the …

A practical automatic polyhedral parallelizer and locality optimizer

U Bondhugula, A Hartono, J Ramanujam… - Proceedings of the 29th …, 2008 - dl.acm.org
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly …