A practical automatic polyhedral parallelizer and locality optimizer

U Bondhugula, A Hartono, J Ramanujam… - Proceedings of the 29th …, 2008 - dl.acm.org
We present the design and implementation of an automatic polyhedral source-to-source
transformation framework that can optimize regular programs (sequences of possibly …

[BOOK][B] Task scheduling for parallel systems

O Sinnen - 2007 - books.google.com
A new model for task scheduling that dramatically improves the efficiency of parallel systems
Task scheduling for parallel systems can become a quagmire of heuristics, models, and …

Automatic transformations for communication-minimized parallelization and locality optimization in the polyhedral model

U Bondhugula, M Baskaran, S Krishnamoorthy… - … CC 2008, Held as Part of …, 2008 - Springer
The polyhedral model provides powerful abstractions to optimize loop nests with regular
accesses. Affine transformations in this model capture a complex sequence of execution …

High performance computing in satellite SAR interferometry: A critical perspective

P Imperatore, A Pepe, E Sansosti - Remote Sensing, 2021 - mdpi.com
Synthetic aperture radar (SAR) interferometry has rapidly evolved in the last decade and can
be considered today as a mature technology, which incorporates computationally intensive …

Ios: Inter-operator scheduler for cnn acceleration

Y Ding, L Zhu, Z Jia, G Pekhimenko… - … of Machine Learning …, 2021 - proceedings.mlsys.org
To accelerate CNN inference, existing deep learning frameworks focus on optimizing intra-
operator parallelization. However, a single operator can no longer fully utilize the available …

Semi-automatic composition of loop transformations for deep parallelism and memory hierarchies

S Girbal, N Vasilache, C Bastoul, A Cohen… - International Journal of …, 2006 - Springer
Modern compilers are responsible for translating the idealistic operational semantics of the
source program into a form that makes efficient use of a highly complex heterogeneous …

Multi-dimensional rankings, program termination, and complexity bounds of flowchart programs

C Alias, A Darte, P Feautrier, L Gonnord - Static Analysis: 17th International …, 2010 - Springer
Proving the termination of a flowchart program can be done by exhibiting a ranking function,
ie, a function from the program states to a well-founded set, which strictly decreases at each …

A survey of pipelined workflow scheduling: Models and algorithms

A Benoit, ÜV Çatalyürek, Y Robert… - ACM Computing Surveys …, 2013 - dl.acm.org
A large class of applications need to execute the same workflow on different datasets of
identical size. Efficient execution of such applications necessitates intelligent distribution of …

Whole-function vectorization

R Karrenberg, R Karrenberg - Automatic SIMD vectorization of SSA-based …, 2015 - Springer
6 Whole-Function Vectorization Page 1 6 Whole-Function Vectorization In this chapter, we
present the main transformation phases of the WholeFunction Vectorization algorithm: Mask …

Iterative optimization in the polyhedral model: Part II, multidimensional time

LN Pouchet, C Bastoul, A Cohen, J Cavazos - ACM SIGPLAN Notices, 2008 - dl.acm.org
High-level loop optimizations are necessary to achieve good performance over a wide
variety of processors. Their performance impact can be significant because they involve in …