A practical automatic polyhedral parallelizer and locality optimizer

U Bondhugula, A Hartono, J Ramanujam… - Proceedings of the 29th …, 2008‏ - dl.acm.org
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly …

[ספר][B] Task scheduling for parallel systems

O Sinnen - 2007‏ - books.google.com
A new model for task scheduling that dramatically improves the efficiency of parallel systems
Task scheduling for parallel systems can become a quagmire of heuristics, models, and …

Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

U Bondhugula, M Baskaran, S Krishnamoorthy… - … CC 2008, Held as Part of …, 2008‏ - Springer
The polyhedral model provides powerful abstractions to optimize loop nests with regular
accesses. Affine transformations in this model capture a complex sequence of execution …

[HTML][HTML] High performance computing in satellite SAR interferometry: A critical perspective

P Imperatore, A Pepe, E Sansosti - Remote Sensing, 2021‏ - mdpi.com
Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can
be considered today as a mature technology, which incorporates computationally intensive …

A survey of pipelined workflow scheduling: Models and algorithms

A Benoit, ÜV Çatalyürek, Y Robert… - ACM Computing Surveys …, 2013‏ - dl.acm.org
A large class of applications need to execute the same workflow on different datasets of
identical size. Efficient execution of such applications necessitates intelligent distribution of …

Ios: Inter-operator scheduler for cnn acceleration

Y Ding, L Zhu, Z Jia, G Pekhimenko… - … of Machine Learning …, 2021‏ - proceedings.mlsys.org
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-
operator parallelization. However, a single operator can no longer fully utilize the available …

Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

S Girbal, N Vasilache, C Bastoul, A Cohen… - International Journal of …, 2006‏ - Springer
Modern compilers are responsible for translating the idealistic operational semantics of the
source program into a form that makes efficient use of a highly complex heterogeneous …

Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs

C Alias, A Darte, P Feautrier, L Gonnord - International Static Analysis …, 2010‏ - Springer
Proving the termination of a flowchart program can be done by exhibiting a ranking function,
ie, a function from the program states to a well-founded set, which strictly decreases at each …

Iterative optimization in the polyhedral model: Part II, multidimensional time

LN Pouchet, C Bastoul, A Cohen, J Cavazos - ACM SIGPLAN Notices, 2008‏ - dl.acm.org
High-level loop optimizations are necessary to achieve good performance over a wide
variety of processors. Their performance impact can be significant because they involve in …

Whole-function vectorization

R Karrenberg, R Karrenberg - Automatic SIMD vectorization of SSA-based …, 2015‏ - Springer
6 Whole-Function Vectorization Page 1 6 Whole-Function Vectorization In this chapter, we
present the main transformation phases of the WholeFunction Vectorization algorithm: Mask …