A catalog of stream processing optimizations
Various research communities have independently arrived at stream processing as a
programming model for efficient and parallel computing. These communities include digital …
programming model for efficient and parallel computing. These communities include digital …
Recursive blocked algorithms and hybrid data structures for dense matrix library software
E Elmroth, F Gustavson, I Jonsson, B Kågström - SIAM review, 2004 - SIAM
Matrix computations are both fundamental and ubiquitous in computational science and its
vast application areas. Along with the development of more advanced computer systems …
vast application areas. Along with the development of more advanced computer systems …
Scaling distributed machine learning with the parameter server
We propose a parameter server framework for distributed machine learning problems. Both
data and workloads are distributed over worker nodes, while the server nodes maintain …
data and workloads are distributed over worker nodes, while the server nodes maintain …
Ernest: Efficient performance prediction for {Large-Scale} advanced analytics
Recent workload trends indicate rapid growth in the deployment of machine learning,
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …
genomics and scientific workloads on cloud computing infrastructure. However, efficiently …
[BOG][B] Accuracy and stability of numerical algorithms
NJ Higham - 2002 - SIAM
In the nearly seven years since I finished writing the first edition of this book research on the
accuracy and stability of numerical algorithms has continued to flourish and mature. Our …
accuracy and stability of numerical algorithms has continued to flourish and mature. Our …
[BOG][B] The method of moments in electromagnetics
WC Gibson - 2021 - taylorfrancis.com
The Method of Moments in Electromagnetics, Third Edition details the numerical solution of
electromagnetic integral equations via the Method of Moments (MoM). Previous editions …
electromagnetic integral equations via the Method of Moments (MoM). Previous editions …
Large-scale deep unsupervised learning using graphics processors
The promise of unsupervised learning methods lies in their potential to use vast amounts of
unlabeled data to learn complex, highly nonlinear models with millions of free parameters …
unlabeled data to learn complex, highly nonlinear models with millions of free parameters …
Auto-tuning a high-level language targeted to GPU codes
Determining the best set of optimizations to apply to a kernel to be executed on the graphics
processing unit (GPU) is a challenging problem. There are large sets of possible …
processing unit (GPU) is a challenging problem. There are large sets of possible …
The LINPACK benchmark: past, present and future
JJ Dongarra, P Luszczek… - … and Computation: practice …, 2003 - Wiley Online Library
This paper describes the LINPACK Benchmark and some of its variations commonly used to
assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the …
assess the performance of computer systems. Aside from the LINPACK Benchmark suite, the …
A practical automatic polyhedral parallelizer and locality optimizer
U Bondhugula, A Hartono, J Ramanujam… - Proceedings of the 29th …, 2008 - dl.acm.org
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly …
transformation framework that can optimize regular programs (sequences of possibly …