- Academic Search

T Besard, C Foket, B De Sutter - IEEE Transactions on Parallel …, 2018 - ieeexplore.ieee.org

GPUs and other accelerators are popular devices for accelerating compute-intensive,
parallelizable applications. However, programming these devices is a difficult task. Writing …

Salva Cita Citato da 276 Articoli correlati Tutte e 7 le versioni

[Free GPT-4]

[PDF] acm.org

Reverse-mode automatic differentiation and optimization of GPU kernels via Enzyme

WS Moses, V Churavy, L Paehler… - Proceedings of the …, 2021 - dl.acm.org

Computing derivatives is key to many algorithms in scientific computing and machine
learning such as optimization, uncertainty quantification, and stability analysis. Enzyme is a …

Salva Cita Citato da 74 Articoli correlati Tutte e 12 le versioni

[Free GPT-4]

[PDF] springer.com

SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems

A Ernstsson, L Li, C Kessler - International Journal of Parallel …, 2018 - Springer

In this article we present SkePU 2, the next generation of the SkePU C++ skeleton
programming framework for heterogeneous parallel systems. We critically examine the …

Salva Cita Citato da 118 Articoli correlati Tutte e 9 le versioni

[Free GPT-4]

[PDF] hal.science

Register optimizations for stencils on GPUs

PS Rawat, F Rastello, A Sukumaran-Rajam… - Proceedings of the 23rd …, 2018 - dl.acm.org

The recent advent of compute-intensive GPU architecture has allowed application
developers to explore high-order 3D stencils for better computational accuracy. A common …

Salva Cita Citato da 68 Articoli correlati Tutte e 5 le versioni

[Free GPT-4]

[PDF] google.com

Understanding the GPU microarchitecture to achieve bare-metal performance tuning

X Zhang, G Tan, S Xue, J Li, K Zhou… - Proceedings of the 22nd …, 2017 - dl.acm.org

In this paper, we present a methodology to understand GPU microarchitectural features and
improve performance for compute-intensive kernels. The methodology relies on a reverse …

Salva Cita Citato da 73 Articoli correlati Tutte e 6 le versioni

[Free GPT-4]

[PDF] acm.org

Cudaadvisor: Llvm-based runtime profiling for modern gpus

D Shen, SL Song, A Li, X Liu - … of the 2018 International Symposium on …, 2018 - dl.acm.org

General-purpose GPUs have been widely utilized to accelerate parallel applications. Given
a relatively complex programming model and fast architecture evolution, producing efficient …

Salva Cita Citato da 55 Articoli correlati Tutte e 2 le versioni

[Free GPT-4]

[PDF] acm.org

The missing pieces of open design enablement: A recent history of google efforts

T Ansell, M Saligane - Proceedings of the 39th International Conference …, 2020 - dl.acm.org

In an initiative to advance the open-source electronic design automation (EDA) and
hardware design community, Google has been spearheading a global collaborative effort …

Salva Cita Citato da 39 Articoli correlati Tutte e 3 le versioni

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Optimization of flexible neighbors lists in Smoothed Particle Hydrodynamics on GPU

G Bilotta, V Zago, A Hérault, A Cappello… - … in Engineering Software, 2024 - Elsevier

Recent refactoring of the GPUSPH codebase have uncovered some of the limitations of the
official CUDA compiler (nvcc) offered by NVIDIA when dealing with some C++ constructs …

Salva Cita Citato da 1 Articoli correlati Tutte e 4 le versioni

[Free GPT-4]

[PDF] acm.org

Guardian: Safe GPU Sharing in Multi-Tenant Environments

M Pavlidakis, G Vasiliadis, S Mavridis… - Proceedings of the 25th …, 2024 - dl.acm.org

Modern GPU applications, such as machine learning (ML), can only partially utilize GPUs,
leading to GPU underutilization in cloud environments. Sharing GPUs across multiple …

Salva Cita Citato da 2 Articoli correlati

[Free GPT-4]

[PDF] supercomputing.org

Cuda flux: A lightweight instruction profiler for cuda applications

L Braun, H Fröning - 2019 IEEE/ACM Performance Modeling …, 2019 - ieeexplore.ieee.org

GPUs are powerful, massively parallel processors, which require a vast amount of thread
parallelism to keep their thousands of execution units busy, and to tolerate latency when …

Salva Cita Citato da 31 Articoli correlati Tutte e 5 le versioni

Crea avviso

Cita

Ricerca avanzata

Salvato in La mia biblioteca

gpucc: an open-source GPGPU compiler

Effective extensible programming: unleashing Julia on GPUs

Reverse-mode automatic differentiation and optimization of GPU kernels via Enzyme

SkePU 2: Flexible and type-safe skeleton programming for heterogeneous parallel systems

Register optimizations for stencils on GPUs

Understanding the GPU microarchitecture to achieve bare-metal performance tuning

Cudaadvisor: Llvm-based runtime profiling for modern gpus

The missing pieces of open design enablement: A recent history of google efforts

[HTML][HTML] Optimization of flexible neighbors lists in Smoothed Particle Hydrodynamics on GPU

Guardian: Safe GPU Sharing in Multi-Tenant Environments

Cuda flux: A lightweight instruction profiler for cuda applications