- Academic Search

S Pai, MJ Thazhuthaveetil… - ACM SIGARCH Computer …, 2013 - dl.acm.org

Each new generation of GPUs vastly increases the resources available to GPGPU
programs. GPU programming models (like CUDA) were designed to scale to use these …

Spara Citera Citerat av 291 Relaterade artiklar Alla 10 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] gatech.edu

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

GF Diamos, AR Kerr, S Yalamanchili… - Proceedings of the 19th …, 2010 - dl.acm.org

Ocelot is a dynamic compilation framework designed to map the explicitly data parallel
execution model used by NVIDIA CUDA applications onto diverse multithreaded platforms …

Spara Citera Citerat av 352 Relaterade artiklar Alla 6 versionerna

Performance characterization of the nas parallel benchmarks in opencl

S Seo, G Jo, J Lee - 2011 IEEE international symposium on …, 2011 - ieeexplore.ieee.org

Heterogeneous parallel computing platforms, which are composed of different processors
(eg, CPUs, GPUs, FPGAs, and DSPs), are widening their user base in all computing …

Spara Citera Citerat av 272 Relaterade artiklar Alla 5 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] ufmg.br

Divergence analysis and optimizations

B Coutinho, D Sampaio, FMQ Pereira… - 2011 International …, 2011 - ieeexplore.ieee.org

The growing interest in GPU programming has brought renewed attention to the Single
Instruction Multiple Data (SIMD) execution model. SIMD machines give application …

Spara Citera Citerat av 133 Relaterade artiklar Alla 9 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] cam.ac.uk

A sparse probabilistic learning algorithm for real-time tracking

Blake, Cipolla - Proceedings Ninth IEEE International …, 2003 - ieeexplore.ieee.org

We address the problem of applying powerful pattern recognition algorithms based on
kernels to efficient visual tracking. Recently S. Avidan,(2001) has shown that object …

Spara Citera Citerat av 160 Relaterade artiklar Alla 18 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Convergence and scalarization for data-parallel architectures

Y Lee, R Krashinsky, V Grover… - Proceedings of the …, 2013 - ieeexplore.ieee.org

Modern throughput processors such as GPUs achieve high performance and efficiency by
exploiting data parallelism in application kernels expressed as threaded code. One draw …

Spara Citera Citerat av 88 Relaterade artiklar Alla 14 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

WS Moses, IR Ivanov, J Domke, T Endo… - Proceedings of the 28th …, 2023 - dl.acm.org

While parallelism remains the main source of performance, architectural implementations
and programming models change with each new hardware generation, often leading to …

Spara Citera Citerat av 19 Relaterade artiklar Alla 10 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Automatic and portable map** of data parallel programs to opencl for gpu-based heterogeneous systems

Z Wang, D Grewe, MFP O'boyle - ACM Transactions on Architecture and …, 2014 - dl.acm.org

General-purpose GPU-based systems are highly attractive, as they give potentially massive
performance at little cost. Realizing such potential is challenging due to the complexity of …

Spara Citera Citerat av 77 Relaterade artiklar Alla 5 versionerna

Topical perspective on massive threading and parallelism

RM Farber - Journal of Molecular Graphics and Modelling, 2011 - Elsevier

Unquestionably computer architectures have undergone a recent and noteworthy paradigm
shift that now delivers multi-and many-core systems with tens to many thousands of …

Spara Citera Citerat av 32 Relaterade artiklar Alla 4 versionerna

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

Performance portability with the chapel language

A Sidelnik, S Maleki, BL Chamberlain… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org

It has been widely shown that high-throughput computing architectures such as GPUs offer
large performance gains compared with their traditional low-latency counterparts for many …

Spara Citera Citerat av 67 Relaterade artiklar Alla 10 versionerna

Skapa alarm

Citera

Avancerad sökning

Har sparats i Mitt bibliotek

Efficient compilation of fine-grained SPMD-threaded programs for multicore CPUs

Improving GPGPU concurrency with elastic kernels

Ocelot: a dynamic optimization framework for bulk-synchronous applications in heterogeneous systems

Performance characterization of the nas parallel benchmarks in opencl

Divergence analysis and optimizations

A sparse probabilistic learning algorithm for real-time tracking

Convergence and scalarization for data-parallel architectures

High-performance gpu-to-cpu transpilation and optimization via high-level parallel constructs

Automatic and portable map** of data parallel programs to opencl for gpu-based heterogeneous systems

Topical perspective on massive threading and parallelism

Performance portability with the chapel language