Improving GPGPU concurrency with elastic kernels
S Pai, MJ Thazhuthaveetil… - ACM SIGARCH Computer …, 2013 - dl.acm.org
Each new generation of GPUs vastly increases the resources available to GPGPU
programs. GPU programming models (like CUDA) were designed to scale to use these …
programs. GPU programming models (like CUDA) were designed to scale to use these …
Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems
GF Diamos, AR Kerr, S Yalamanchili… - Proceedings of the 19th …, 2010 - dl.acm.org
Ocelot is a dynamic compilation framework designed to map the explicitly data parallel
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …
Performance characterization of the nas parallel benchmarks in opencl
Heterogeneous parallel computing platforms, which are composed of different processors
(eg, CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing …
(eg, CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing …
Divergence analysis and optimizations
The growing interest in GPU programming has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …
A sparse probabilistic learning algorithm for real-time tracking
We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …
Convergence and scalarization for data-parallel architectures
Modern throughput processors such as GPUs achieve high performance and efficiency by
exploiting data parallelism in application kernels expressed as threaded code. One draw …
exploiting data parallelism in application kernels expressed as threaded code. One draw …
High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs
While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …
and programming models change with each new hardware generation, often leading to …
Automatic and portable map** of data parallel programs to opencl for gpu-based heterogeneous systems
General-purpose GPU-based systems are highly attractive, as they give potentially massive
performance at little cost. Realizing such potential is challenging due to the complexity of …
performance at little cost. Realizing such potential is challenging due to the complexity of …
Topical perspective on massive threading and parallelism
RM Farber - Journal of Molecular Graphics and Modelling, 2011 - Elsevier
Unquestionably computer architectures have undergone a recent and noteworthy paradigm
shift that now delivers multi-and many-core systems with tens to many thousands of …
shift that now delivers multi-and many-core systems with tens to many thousands of …
Performance portability with the chapel language
It has been widely shown that high-throughput computing architectures such as GPUs offer
large performance gains compared with their traditional low-latency counterparts for many …
large performance gains compared with their traditional low-latency counterparts for many …