- Academic Search

SM Habib, S Ries, M Muhlhauser - 2010 7th International …, 2010 - ieeexplore.ieee.org

Cloud Computing is an emerging computing paradigm. It shares massively scalable, elastic
resources (eg, data, calculations, and services) transparently among the users over a …

Save Cite Cited by 193 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] hiperfit.dk

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org

Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

Save Cite Cited by 240 Related articles All 10 versions Free GPT-4

[Free GPT-4]

[PDF] hw.ac.uk

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

M Steuwer, C Fensch, S Lindley, C Dubach - ACM SIGPLAN Notices, 2015 - dl.acm.org

Computers have become increasingly complex with the emergence of heterogeneous
hardware combining multicore CPUs and GPUs. These parallel systems exhibit tremendous …

Save Cite Cited by 193 Related articles All 19 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

A compiler for throughput optimization of graph algorithms on GPUs

S Pai, K **ali - Proceedings of the 2016 ACM SIGPLAN International …, 2016 - dl.acm.org

Writing high-performance GPU implementations of graph algorithms can be challenging. In
this paper, we argue that three optimizations called throughput optimizations are key to high …

Save Cite Cited by 127 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] unsw.edu.au

Optimising purely functional GPU programs

TL McDonell, MMT Chakravarty, G Keller… - ACM SIGPLAN …, 2013 - dl.acm.org

Purely functional, embedded array programs are a good match for SIMD hardware, such as
GPUs. However, the naive compilation of such programs quickly leads to both code …

Save Cite Cited by 149 Related articles All 11 versions Free GPT-4

[Free GPT-4]

[PDF] futhark-lang.org

Incremental flattening for nested data parallelism

T Henriksen, F Thorøe, M Elsman… - Proceedings of the 24th …, 2019 - dl.acm.org

Compilation techniques for nested-parallel applications that can adapt to hardware and
dataset characteristics are vital for unlocking the power of modern hardware. This paper …

Save Cite Cited by 53 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] danielwong.org

Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus

J Wang, N Rubin, A Sidelnik… - ACM SIGARCH Computer …, 2015 - dl.acm.org

GPUs have been proven effective for structured applications that map well to the rigid 1D-3D
grid of threads in modern bulk synchronous parallel (BSP) programming languages …

Save Cite Cited by 88 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] semanticscholar.org

Laperm: Locality aware scheduler for dynamic parallelism on gpus

J Wang, N Rubin, A Sidelnik… - ACM SIGARCH Computer …, 2016 - dl.acm.org

Recent developments in GPU execution models and architectures have introduced dynamic
parallelism to facilitate the execution of irregular applications where control flow and …

Save Cite Cited by 69 Related articles All 5 versions Free GPT-4

[Free GPT-4]

[PDF] arxiv.org

Baechi: fast device placement of machine learning graphs

B Jeon, L Cai, P Srivastava, J Jiang, X Ke… - Proceedings of the 11th …, 2020 - dl.acm.org

Machine Learning graphs (or models) can be challenging or impossible to train when either
devices have limited memory, or the models are large. Splitting the model graph across …

Save Cite Cited by 78 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org

Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus

AA Abdolrashidi, D Tripathy, ME Belviranli… - Proceedings of the 50th …, 2017 - dl.acm.org

GPUs lack fundamental support for data-dependent parallelism and synchronization. While
CUDA Dynamic Parallelism signals progress in this direction, many limitations and …

Save Cite Cited by 45 Related articles All 17 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

Nested data-parallelism on the GPU

Cloud computing landscape and research challenges regarding trust and reputation

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

Generating performance portable code using rewrite rules: from high-level functional expressions to high-performance OpenCL code

A compiler for throughput optimization of graph algorithms on GPUs

Optimising purely functional GPU programs

Incremental flattening for nested data parallelism

Dynamic thread block launch: A lightweight execution mechanism to support irregular applications on gpus

Laperm: Locality aware scheduler for dynamic parallelism on gpus

Baechi: fast device placement of machine learning graphs

Wireframe: Supporting data-dependent parallelism through dependency graph execution in gpus