Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org
Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

Incremental flattening for nested data parallelism

T Henriksen, F Thorøe, M Elsman… - Proceedings of the 24th …, 2019 - dl.acm.org
Compilation techniques for nested-parallel applications that can adapt to hardware and
dataset characteristics are vital for unlocking the power of modern hardware. This paper …

Destination-passing style for efficient memory management

A Shaikhha, A Fitzgibbon, S Peyton Jones… - Proceedings of the 6th …, 2017 - dl.acm.org
We show how to compile high-level functional array-processing programs, drawn from
image processing and machine learning, into C code that runs as fast as hand-written C …

Towards size-dependent types for array programming

T Henriksen, M Elsman - Proceedings of the 7th ACM SIGPLAN …, 2021 - dl.acm.org
We present a type system for expressing size constraints on array types in an ML-style type
system. The goal is to detect shape mismatches at compile-time, while being simpler than …

Finpar: A parallel financial benchmark

C Andreetta, V Bégot, J Berthold, M Elsman… - ACM Transactions on …, 2016 - dl.acm.org
Commodity many-core hardware is now mainstream, but parallel programming models are
still lagging behind in efficiently utilizing the application parallelism. There are (at least) two …

Static interpretation of higher-order modules in Futhark: Functional GPU programming in the large

M Elsman, T Henriksen, D Annenkov… - Proceedings of the ACM …, 2018 - dl.acm.org
We present a higher-order module system for the purely functional data-parallel array
language Futhark. The module language has the property that it is completely eliminated at …

Strategies for regular segmented reductions on gpu

RW Larsen, T Henriksen - Proceedings of the 6th ACM SIGPLAN …, 2017 - dl.acm.org
We present and evaluate an implementation technique for regular segmented reductions on
GPUs. Existing techniques tend to be either consistent in performance but relatively …

Formal semantics for the halide language

A Reinking, GL Bernstein, J Ragan-Kelley - arxiv preprint arxiv …, 2022 - arxiv.org
We present the first formalization and metatheory of language soundness for a user-
schedulable language, the widely used array processing language Halide. User …

High-Performance Defunctionalisation in Futhark

AK Hovgaard, T Henriksen, M Elsman - International Symposium on …, 2018 - Springer
General-purpose massively parallel processors, such as GPUs, have become common, but
are difficult to program. Pure functional programming can be a solution, as it guarantees …

Modular acceleration: tricky cases of functional high-performance computing

T Henriksen, M Elsman, CE Oancea - Proceedings of the 7th ACM …, 2018 - dl.acm.org
This case study examines the data-parallel functional implementation of three algorithms:
generation of quasi-random Sobol numbers, breadth-first search, and calibration of Heston …