Tensor comprehensions: Framework-agnostic high-performance machine learning abstractions

N Vasilache, O Zinenko, T Theodoridis, P Goyal… - arxiv preprint arxiv …, 2018 - arxiv.org
Deep learning models with convolutional and recurrent networks are now ubiquitous and
analyze massive amounts of audio, image, video, text and graph data, with applications in …

Opentuner: An extensible framework for program autotuning

J Ansel, S Kamil, K Veeramachaneni… - Proceedings of the 23rd …, 2014 - dl.acm.org
Program autotuning has been shown to achieve better or more portable performance in a
number of domains. However, autotuners themselves are rarely portable between projects …

The design and implementation of FFTW3

M Frigo, SG Johnson - Proceedings of the IEEE, 2005 - ieeexplore.ieee.org
FFTW is an implementation of the discrete Fourier transform (DFT) that adapts to the
hardware in order to maximize performance. This paper shows that such an approach can …

Inversecsg: Automatic conversion of 3d models to csg trees

T Du, JP Inala, Y Pu, A Spielberg, A Schulz… - ACM Transactions on …, 2018 - dl.acm.org
While computer-aided design is a major part of many modern manufacturing pipelines, the
design files typically generated describe raw geometry. Lost in this representation is the …

SPIRAL: Code generation for DSP transforms

M Puschel, JMF Moura, JR Johnson… - Proceedings of the …, 2005 - ieeexplore.ieee.org
Fast changing, increasingly complex, and diverse computing platforms pose central
problems in scientific computing: How to achieve, with reasonable effort, portable optimal …

Challenges and opportunities in many-core computing

JL Manferdelli, NK Govindaraju… - Proceedings of the …, 2008 - ieeexplore.ieee.org
In this paper, we present some of the challenges and opportunities in software development
based on the current hardware trends and the impact of massive parallelism on both the …

Lightweight modular staging: a pragmatic approach to runtime code generation and compiled DSLs

T Rompf, M Odersky - Proceedings of the ninth international conference …, 2010 - dl.acm.org
Software engineering demands generality and abstraction, performance demands
specialization and concretization. Generative programming can provide both, but the effort …

Programming by sketching for bit-streaming programs

A Solar-Lezama, R Rabbah, R Bodík… - Proceedings of the 2005 …, 2005 - dl.acm.org
This paper introduces the concept of programming with sketches, an approach for the rapid
development of high-performance applications. This approach allows a programmer to write …

A heterogeneous parallel framework for domain-specific languages

KJ Brown, AK Sujeeth, HJ Lee, T Rompf… - 2011 International …, 2011 - ieeexplore.ieee.org
Computing systems are becoming increasingly parallel and heterogeneous, and therefore
new applications must be capable of exploiting parallelism in order to continue achieving …

Bliss: auto-tuning complex applications using a pool of diverse lightweight learning models

RB Roy, T Patel, V Gadepally, D Tiwari - Proceedings of the 42nd ACM …, 2021 - dl.acm.org
As parallel applications become more complex, auto-tuning becomes more desirable,
challenging, and time-consuming. We propose, Bliss, a novel solution for auto-tuning …