Improving GPGPU concurrency with elastic kernels

S Pai, MJ Thazhuthaveetil… - ACM SIGARCH Computer …, 2013 - dl.acm.org
Each new generation of GPUs vastly increases the resources available to GPGPU
programs. GPU programming models (like CUDA) were designed to scale to use these …

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

GF Diamos, AR Kerr, S Yalamanchili… - Proceedings of the 19th …, 2010 - dl.acm.org
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …

Performance characterization of the nas parallel benchmarks in opencl

S Seo, G Jo, J Lee - 2011 IEEE international symposium on …, 2011 - ieeexplore.ieee.org
Heterogeneous parallel computing platforms, which are composed of different processors
(eg, CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing …

Divergence analysis and optimizations

B Coutinho, D Sampaio, FMQ Pereira… - 2011 International …, 2011 - ieeexplore.ieee.org
The growing interest in GPU programming has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …

A sparse probabilistic learning algorithm for real-time tracking

Blake, Cipolla - Proceedings Ninth IEEE International …, 2003 - ieeexplore.ieee.org
We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …

Convergence and scalarization for data-parallel architectures

Y Lee, R Krashinsky, V Grover… - Proceedings of the …, 2013 - ieeexplore.ieee.org
Modern throughput processors such as GPUs achieve high performance and efficiency by
exploiting data parallelism in application kernels expressed as threaded code. One draw …

High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

WS Moses, IR Ivanov, J Domke, T Endo… - Proceedings of the 28th …, 2023 - dl.acm.org
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …

Automatic and portable map** of data parallel programs to opencl for gpu-based heterogeneous systems

Z Wang, D Grewe, MFP O'boyle - ACM Transactions on Architecture and …, 2014 - dl.acm.org
General-purpose GPU-based systems are highly attractive, as they give potentially massive
performance at little cost. Realizing such potential is challenging due to the complexity of …

Topical perspective on massive threading and parallelism

RM Farber - Journal of Molecular Graphics and Modelling, 2011 - Elsevier
Unquestionably computer architectures have undergone a recent and noteworthy paradigm
shift that now delivers multi-and many-core systems with tens to many thousands of …

Performance portability with the chapel language

A Sidelnik, S Maleki, BL Chamberlain… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
It has been widely shown that high-throughput computing architectures such as GPUs offer
large performance gains compared with their traditional low-latency counterparts for many …