Cloud computing landscape and research challenges regarding trust and reputation
Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …
resources (eg, data, calculations, and services) transparently among the users over a …
Futhark: purely functional GPU-programming with nested parallelism and in-place array updates
Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …
programming model and an optimising compiler that generates OpenCL code for GPUs …
Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code
Computers have become increasingly complex with the emergence of heterogeneous
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …
A compiler for throughput optimization of graph algorithms on GPUs
Writing high-performance GPU implementations of graph algorithms can be challenging. In
this paper, we argue that three optimizations called throughput optimizations are key to high …
this paper, we argue that three optimizations called throughput optimizations are key to high …
Optimising purely functional GPU programs
Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …
GPUs. However, the naive compilation of such programs quickly leads to both code …
Incremental flattening for nested data parallelism
Compilation techniques for nested-parallel applications that can adapt to hardware and
dataset characteristics are vital for unlocking the power of modern hardware. This paper …
dataset characteristics are vital for unlocking the power of modern hardware. This paper …
Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus
GPUs have been proven effective for structured applications that map well to the rigid 1D-3D
grid of threads in modern bulk synchronous parallel (BSP) programming languages …
grid of threads in modern bulk synchronous parallel (BSP) programming languages …
Laperm: Locality aware scheduler for dynamic parallelism on gpus
Recent developments in GPU execution models and architectures have introduced dynamic
parallelism to facilitate the execution of irregular applications where control flow and …
parallelism to facilitate the execution of irregular applications where control flow and …
Baechi: fast device placement of machine learning graphs
Machine Learning graphs (or models) can be challenging or impossible to train when either
devices have limited memory, or the models are large. Splitting the model graph across …
devices have limited memory, or the models are large. Splitting the model graph across …
Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus
GPUs lack fundamental support for data-dependent parallelism and synchronization. While
CUDA Dynamic Parallelism signals progress in this direction, many limitations and …
CUDA Dynamic Parallelism signals progress in this direction, many limitations and …