Tiramisu: A polyhedral compiler for expressing fast and portable code

R Baghdadi, J Ray, MB Romdhane… - 2019 IEEE/ACM …, 2019 - ieeexplore.ieee.org
This paper introduces Tiramisu, a polyhedral framework designed to generate high
performance code for multiple platforms including multicores, GPUs, and distributed …

Green-Marl: a DSL for easy and efficient graph analysis

S Hong, H Chafi, E Sedlar, K Olukotun - Proceedings of the seventeenth …, 2012 - dl.acm.org
The increasing importance of graph-data based applications is fueling the need for highly
efficient and parallel implementations of graph analysis software. In this paper we describe …

Delite: A compiler architecture for performance-oriented embedded domain-specific languages

AK Sujeeth, KJ Brown, H Lee, T Rompf… - ACM Transactions on …, 2014 - dl.acm.org
Develo** high-performance software is a difficult task that requires the use of low-level,
architecture-specific programming models (eg, OpenMP for CMPs, CUDA for GPUs, MPI for …

[PDF][PDF] OptiML: an implicitly parallel domain-specific language for machine learning

A Sujeeth, HJ Lee, K Brown, T Rompf… - Proceedings of the …, 2011 - researchgate.net
As the size of datasets continues to grow, machine learning applications are becoming
increasingly limited by the amount of available computational power. Taking advantage of …

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

M Steuwer, C Fensch, S Lindley, C Dubach - ACM SIGPLAN Notices, 2015 - dl.acm.org
Computers have become increasingly complex with the emergence of heterogeneous
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …

A heterogeneous parallel framework for domain-specific languages

KJ Brown, AK Sujeeth, HJ Lee, T Rompf… - 2011 International …, 2011 - ieeexplore.ieee.org
Computing systems are becoming increasingly parallel and heterogeneous, and therefore
new applications must be capable of exploiting parallelism in order to continue achieving …

Pencil: A platform-neutral compute intermediate language for accelerator programming

R Baghdadi, U Beaugnon, A Cohen… - 2015 International …, 2015 - ieeexplore.ieee.org
Programming accelerators such as GPUs with low-level APIs and languages such as
OpenCL and CUDA is difficult, error-prone, and not performance-portable. Automatic …

Codon: A compiler for high-performance pythonic applications and dsls

A Shajii, G Ramirez, H Smajlović, J Ray… - Proceedings of the …, 2023 - dl.acm.org
Domain-specific languages (DSLs) are able to provide intuitive high-level abstractions that
are easy to work with while attaining better performance than general-purpose languages …

CudaDMA: optimizing GPU memory bandwidth via warp specialization

M Bauer, H Cook, B Khailany - … of 2011 international conference for high …, 2011 - dl.acm.org
As the computational power of GPUs continues to scale with Moore's Law, an increasing
number of applications are becoming limited by memory bandwidth. We propose an …

Dimmwitted: A study of main-memory statistical analytics

C Zhang, C Ré - arxiv preprint arxiv:1403.7550, 2014 - arxiv.org
We perform the first study of the tradeoff space of access methods and replication to support
statistical analytics using first-order methods executed in the main memory of a Non-Uniform …