- Academic Search

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Tallenna Viittaa Viittausten määrä 69 Aiheeseen liittyviä artikkeleita Kaikki 3 versiota

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

A Comprehensive Survey of Benchmarks for Improvement of Software's Non-Functional Properties

A Blot, J Petke - ACM Computing Surveys, 2025 - dl.acm.org

Despite recent increase in research on improvement of non-functional properties of
software, such as energy usage or program size, there is a lack of standard benchmarks for …

Tallenna Viittaa Viittausten määrä 1 Aiheeseen liittyviä artikkeleita Kaikki 5 versiota

[Free GPT-4]
[DeepSeek]

[PDF] hiperfit.dk

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

T Henriksen, NGW Serup, M Elsman… - Proceedings of the 38th …, 2017 - dl.acm.org

Futhark is a purely functional data-parallel array language that offers a machine-neutral
programming model and an optimising compiler that generates OpenCL code for GPUs …

Tallenna Viittaa Viittausten määrä 243 Aiheeseen liittyviä artikkeleita Kaikki 10 versiota

[Free GPT-4]
[DeepSeek]

[PDF] academia.edu

A comprehensive performance comparison of CUDA and OpenCL

J Fang, AL Varbanescu, H Sips - … International Conference on …, 2011 - ieeexplore.ieee.org

This paper presents a comprehensive performance comparison between CUDA and
OpenCL. We have selected 16 benchmarks ranging from synthetic applications to real-world …

Tallenna Viittaa Viittausten määrä 465 Aiheeseen liittyviä artikkeleita Kaikki 11 versiota

[Free GPT-4]
[DeepSeek]

[PDF] vuduc.org

A performance analysis framework for identifying potential benefits in GPGPU applications

J Sim, A Dasgupta, H Kim, R Vuduc - Proceedings of the 17th ACM …, 2012 - dl.acm.org

Tuning code for GPGPU and other emerging many-core platforms is a challenge because
few models or tools can precisely pinpoint the root cause of performance bottlenecks. In this …

Tallenna Viittaa Viittausten määrä 274 Aiheeseen liittyviä artikkeleita Kaikki 18 versiota

[Free GPT-4]
[DeepSeek]

[PDF] udel.edu

Reducing branch divergence in GPU programs

TD Han, TS Abdelrahman - Proceedings of the fourth workshop on …, 2011 - dl.acm.org

Branch divergence has a significant impact on the performance of GPU programs. We
propose two novel software-based optimizations, called iteration delaying and branch …

Tallenna Viittaa Viittausten määrä 288 Aiheeseen liittyviä artikkeleita Kaikki 13 versiota

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimizing memory efficiency for deep convolutional neural networks on GPUs

C Li, Y Yang, M Feng, S Chakradhar… - SC'16: Proceedings of …, 2016 - ieeexplore.ieee.org

Leveraging large data sets, deep Convolutional Neural Networks (CNNs) achieve state-of-
the-art recognition accuracy. Due to the substantial compute and memory operations …

Tallenna Viittaa Viittausten määrä 141 Aiheeseen liittyviä artikkeleita Kaikki 6 versiota

[Free GPT-4]
[DeepSeek]

[PDF] psu.edu

On-the-fly elimination of dynamic irregularities for GPU computing

EZ Zhang, Y Jiang, Z Guo, K Tian, X Shen - ACM SIGPLAN Notices, 2011 - dl.acm.org

The power-efficient massively parallel Graphics Processing Units (GPUs) have become
increasingly influential for general-purpose computing over the past few years. However …

Tallenna Viittaa Viittausten määrä 255 Aiheeseen liittyviä artikkeleita Kaikki 12 versiota

Many-thread aware prefetching mechanisms for GPGPU applications

J Lee, NB Lakshminarayana, H Kim… - 2010 43rd Annual IEEE …, 2010 - ieeexplore.ieee.org

We consider the problem of how to improve memory latency tolerance in massively
multithreaded GPGPUs when the thread-level parallelism of an application is not sufficient to …

Tallenna Viittaa Viittausten määrä 195 Aiheeseen liittyviä artikkeleita Kaikki 4 versiota

[Free GPT-4]
[DeepSeek]

[PDF] cmu.edu

Characterizing and improving the use of demand-fetched caches in GPUs

W Jia, KA Shaw, M Martonosi - … of the 26th ACM international conference …, 2012 - dl.acm.org

Initially introduced as special-purpose accelerators for games and graphics code, graphics
processing units (GPUs) have emerged as widely-used high-performance parallel …

Tallenna Viittaa Viittausten määrä 180 Aiheeseen liittyviä artikkeleita Kaikki 8 versiota

Luo ilmoitus

Viittaa

Tarkennettu haku

Tallennettu omaan kirjastoon

A GPGPU compiler for memory optimization and parallelism management

Optimization techniques for GPU programming

A Comprehensive Survey of Benchmarks for Improvement of Software's Non-Functional Properties

Futhark: purely functional GPU-programming with nested parallelism and in-place array updates

A comprehensive performance comparison of CUDA and OpenCL

A performance analysis framework for identifying potential benefits in GPGPU applications

Reducing branch divergence in GPU programs

Optimizing memory efficiency for deep convolutional neural networks on GPUs

On-the-fly elimination of dynamic irregularities for GPU computing

Many-thread aware prefetching mechanisms for GPGPU applications

Characterizing and improving the use of demand-fetched caches in GPUs