A practical automatic polyhedral parallelizer and locality optimizer
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly …
transformation framework that can optimize regular programs (sequences of possibly …
[BOOK][B] Task scheduling for parallel systems
O Sinnen - 2007 - books.google.com
A new model for task scheduling that dramatically improves the efficiency of parallel systems
Task scheduling for parallel systems can become a quagmire of heuristics, models, and …
Task scheduling for parallel systems can become a quagmire of heuristics, models, and …
Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model
The polyhedral model provides powerful abstractions to optimize loop nests with regular
accesses. Affine transformations in this model capture a complex sequence of execution …
accesses. Affine transformations in this model capture a complex sequence of execution …
High performance computing in satellite SAR interferometry: A critical perspective
Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can
be considered today as a mature technology, which incorporates computationally intensive …
be considered today as a mature technology, which incorporates computationally intensive …
Ios: Inter-operator scheduler for cnn acceleration
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-
operator parallelization. However, a single operator can no longer fully utilize the available …
operator parallelization. However, a single operator can no longer fully utilize the available …
Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies
Modern compilers are responsible for translating the idealistic operational semantics of the
source program into a form that makes efficient use of a highly complex heterogeneous …
source program into a form that makes efficient use of a highly complex heterogeneous …
Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs
Proving the termination of a flowchart program can be done by exhibiting a ranking function,
ie, a function from the program states to a well-founded set, which strictly decreases at each …
ie, a function from the program states to a well-founded set, which strictly decreases at each …
A survey of pipelined workflow scheduling: Models and algorithms
A large class of applications need to execute the same workflow on different datasets of
identical size. Efficient execution of such applications necessitates intelligent distribution of …
identical size. Efficient execution of such applications necessitates intelligent distribution of …
Whole-function vectorization
R Karrenberg, R Karrenberg - Automatic SIMD vectorization of SSA-based …, 2015 - Springer
6 Whole-Function Vectorization Page 1 6 Whole-Function Vectorization In this chapter, we
present the main transformation phases of the WholeFunction Vectorization algorithm: Mask …
present the main transformation phases of the WholeFunction Vectorization algorithm: Mask …
Iterative optimization in the polyhedral model: Part II, multidimensional time
High-level loop optimizations are necessary to achieve good performance over a wide
variety of processors. Their performance impact can be significant because they involve in …
variety of processors. Their performance impact can be significant because they involve in …