The sparse polyhedral framework: Composing compiler-generated inspector-executor code

MM Strout, M Hall, C Olschanowsky - Proceedings of the IEEE, 2018‏ - ieeexplore.ieee.org
Irregular applications such as big graph analysis, material simulations, molecular dynamics
simulations, and finite element analysis have performance problems due to their use of …

[کتاب][B] Iterative methods for sparse linear systems

Y Saad - 2003‏ - SIAM
In the six years that have passed since the publication of the first edition of this book,
iterative methods for linear systems have made good progress in scientific and engineering …

Some efficient solutions to the affine scheduling problem. I. One-dimensional time

P Feautrier - International journal of parallel programming, 1992‏ - Springer
Programs and systems of recurrence equations may be represented as sets of actions which
are to be executed subject to precedence constraints. In may cases, actions may be labelled …

[HTML][HTML] Iterative solution of linear systems in the 20th century

Y Saad, HA Van Der Vorst - Journal of Computational and Applied …, 2000‏ - Elsevier
This paper sketches the main research developments in the area of iterative methods for
solving linear systems during the 20th century. Although iterative methods for solving linear …

The LRPD test: Speculative run-time parallelization of loops with privatization and reduction parallelization

L Rauchwerger, D Padua - Proceedings of the ACM SIGPLAN 1995 …, 1995‏ - dl.acm.org
Current parallelizing compilers cannot identify a significant fraction of parallelizable loops
because they have complex or statically insufficiently defined access patterns. As …

[کتاب][B] Automatic performance tuning of sparse matrix kernels

RW Vuduc - 2003‏ - search.proquest.com
This dissertation presents an automated system to generate highly efficient, platform-
adapted implementations of sparse matrix kernels. We show that conventional …

Automatic CPU-GPU communication management and optimization

TB Jablin, P Prabhu, JA Jablin, NP Johnson… - Proceedings of the …, 2011‏ - dl.acm.org
The performance benefits of GPU parallelism can be enormous, but unlocking this
performance potential is challenging. The applicability and performance of GPU …

Tempest and Typhoon: User-level shared memory

SK Reinhardt, JR Larus, DA Wood - Proceedings of the 21st annual …, 1994‏ - dl.acm.org
Future parallel computers must efficiently execute not only hand-coded applications but also
programs written in high-level, parallel programming languages. Today's machines limit …

A survey on thread-level speculation techniques

A Estebanez, DR Llanos… - ACM Computing Surveys …, 2016‏ - dl.acm.org
Thread-Level Speculation (TLS) is a promising technique that allows the parallel execution
of sequential code without relying on a prior, compile-time-dependence analysis. In this …

[PDF][PDF] Parallel solution of sparse triangular linear systems in the preconditioned iterative methods on the GPU

M Naumov - NVIDIA Corp., Westford, MA, USA, Tech. Rep …, 2011‏ - research.nvidia.com
A novel algorithm for solving in parallel a sparse triangular linear system on a graphical
processing unit is proposed. It implements the solution of the triangular system in two …