Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org
This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

A heuristic clustering-based task deployment approach for load balancing using Bayes theorem in cloud environment

J Zhao, K Yang, X Wei, Y Ding, L Hu… - IEEE Transactions on …, 2015 - ieeexplore.ieee.org
Aiming at the current problems that most physical hosts in the cloud data center are so
overloaded that it makes the whole cloud data center'load imbalanced and that existing load …

High performance stencil code generation with lift

B Hagedorn, L Stoltzfus, M Steuwer… - Proceedings of the …, 2018 - dl.acm.org
Stencil computations are widely used from physical simulations to machine-learning. They
are embarrassingly parallel and perfectly fit modern hardware such as Graphic Processing …

A stencil compiler for short-vector simd architectures

T Henretty, R Veras, F Franchetti, LN Pouchet… - Proceedings of the 27th …, 2013 - dl.acm.org
Stencil computations are an integral component of applications in a number of scientific
computing domains. Short-vector SIMD instruction sets are ubiquitous on modern …

Hybrid hexagonal/classical tiling for GPUs

T Grosser, A Cohen, J Holewinski… - Proceedings of Annual …, 2014 - dl.acm.org
Time-tiling is necessary for the efficient execution of iterative stencil computations. Classical
hyper-rectangular tiles cannot be used due to the combination of backward and forward …

Domain-specific multi-level ir rewriting for gpu: The open earth compiler for gpu-accelerated climate simulation

T Gysi, C Müller, O Zinenko, S Herhut, E Davis… - ACM Transactions on …, 2021 - dl.acm.org
Most compilers have a single core intermediate representation (IR)(eg, LLVM) sometimes
complemented with vaguely defined IR-like data structures. This IR is commonly low-level …

AN5D: automated stencil framework for high-degree temporal blocking on GPUs

K Matsumura, HR Zohouri, M Wahib, T Endo… - Proceedings of the 18th …, 2020 - dl.acm.org
Stencil computation is one of the most widely-used compute patterns in high performance
computing applications. Spatial and temporal blocking have been proposed to overcome the …

OpenCL-based FPGA-platform for stencil computation and its optimization methodology

HM Waidyasooriya, Y Takei, S Tatsumi… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Stencil computation is widely used in scientific computations and many accelerators based
on multicore CPUs and GPUs have been proposed. Stencil computation has a small …

Diamond tiling: Tiling techniques to maximize parallelism for stencil computations

U Bondhugula, V Bandishti… - IEEE Transactions on …, 2016 - ieeexplore.ieee.org
Most stencil computations allow tile-wise concurrent start, ie, there always exists a face of
the iteration space and a set of tiling directions such that all tiles along that face can be …

Domain-specific optimization and generation of high-performance GPU code for stencil computations

PS Rawat, M Vaidya… - Proceedings of the …, 2018 - ieeexplore.ieee.org
Stencil computations arise in a number of computational domains. They exhibit significant
data parallelism and are thus well suited for execution on graphical processing units …