Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Data locality in high performance computing, big data, and converged systems: An analysis of the cutting edge and a future system architecture

S Usman, R Mehmood, I Katib, A Albeshri - Electronics, 2022 - mdpi.com
Big data has revolutionized science and technology leading to the transformation of our
societies. High-performance computing (HPC) provides the necessary computational power …

Polymage: Automatic optimization for image processing pipelines

RT Mullapudi, V Vasista, U Bondhugula - ACM SIGARCH Computer …, 2015 - dl.acm.org
This paper presents the design and implementation of PolyMage, a domain-specific
language and compiler for image processing pipelines. An image processing pipeline can …

Patus: A code generation and autotuning framework for parallel iterative stencil computations on modern microarchitectures

M Christen, O Schenk, H Burkhart - 2011 IEEE International …, 2011 - ieeexplore.ieee.org
Stencil calculations comprise an important class of kernels in many scientific computing
applications ranging from simple PDE solvers to constituent kernels in multigrid methods as …

Autotuning in high-performance computing applications

P Balaprakash, J Dongarra, T Gamblin… - Proceedings of the …, 2018 - ieeexplore.ieee.org
Autotuning refers to the automatic generation of a search space of possible implementations
of a computation that are evaluated through models and/or empirical measurement to …

Achieving high-performance the functional way: a functional pearl on expressing high-performance optimizations as rewrite strategies

B Hagedorn, J Lenfers, T Koehler, X Qin… - Proceedings of the …, 2020 - dl.acm.org
Optimizing programs to run efficiently on modern parallel hardware is hard but crucial for
many applications. The predominantly used imperative languages-like C or OpenCL-force …

Loop and data transformations for sparse matrix code

A Venkat, M Hall, M Strout - ACM SIGPLAN Notices, 2015 - dl.acm.org
This paper introduces three new compiler transformations for representing and transforming
sparse matrix computations and their data representations. In cooperation with run-time …

Register optimizations for stencils on GPUs

PS Rawat, F Rastello, A Sukumaran-Rajam… - Proceedings of the 23rd …, 2018 - dl.acm.org
The recent advent of compute-intensive GPU architecture has allowed application
developers to explore high-order 3D stencils for better computational accuracy. A common …

Loo. py: transformation-based code generation for GPUs and CPUs

A Klöckner - Proceedings of ACM SIGPLAN international workshop …, 2014 - dl.acm.org
Today's highly heterogeneous computing landscape places a burden on programmers
wanting to achieve high performance on a reasonably broad cross-section of machines. To …

A programming language interface to describe transformations and code generation

G Rudy, MM Khan, M Hall, C Chen… - … TX, USA, October 7-9, 2010 …, 2011 - Springer
This paper presents a programming language interface, a complete scripting language, to
describe composable compiler transformations. These transformation programs can be …