Google Akademik

P Hijma, S Heldens, A Sclocco… - ACM Computing …, 2023 - dl.acm.org

In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …

Kaydet Alıntı yap Alıntılanma sayısı: 69 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

Optimizing CUDA code by kernel fusion: application on BLAS

J Filipovič, M Madzin, J Fousek, L Matyska - The Journal of …, 2015 - Springer

Contemporary GPUs have significantly higher arithmetic throughput than a memory
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …

Kaydet Alıntı yap Alıntılanma sayısı: 106 İlgili makaleler 14 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] acm.org

Logca: A high-level performance model for hardware accelerators

MSB Altaf, DA Wood - ACM SIGARCH Computer Architecture News, 2017 - dl.acm.org

With the end of Dennard scaling, architects have increasingly turned to special-purpose
hardware accelerators to improve the performance and energy efficiency for some …

Kaydet Alıntı yap Alıntılanma sayısı: 57 İlgili makaleler 10 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] arxiv.org

GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems

M Kreutzer, J Thies, M Röhrig-Zöllner, A Pieper… - International Journal of …, 2017 - Springer

While many of the architectural details of future exascale-class high performance computer
systems are still a matter of intense research, there appears to be a general consensus that …

Kaydet Alıntı yap Alıntılanma sayısı: 52 İlgili makaleler 10 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] googleapis.com

Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units

A Ashari, M Boehm, KW Campbell… - US Patent …, 2018 - Google Patents

(57) ABSTRACT A method for optimization of machine learning (ML) work loads on a
graphics processor unit (GPU). The method includes identifying a computation having a …

Kaydet Alıntı yap Alıntılanma sayısı: 37 İlgili makaleler 4 sürümün hepsi Önbellek

[Free GPT-4]
[DeepSeek]

[PDF] researchgate.net

Systematic fusion of CUDA kernels for iterative sparse linear system solvers

JI Aliaga, J Pérez, ES Quintana-Ortí - European Conference on Parallel …, 2015 - Springer

We introduce a systematic analysis in order to fuse CUDA kernels arising in efficient iterative
methods for the solution of sparse linear systems. Our procedure characterizes the input and …

Kaydet Alıntı yap Alıntılanma sayısı: 26 İlgili makaleler 4 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] hal.science

Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy

M Seznec, N Gac, F Orieux, AS Naik - Journal of Real-Time Image …, 2022 - Springer

Determining the optical flow of a video is a compute-intensive task essential for computer
vision. For achieving this processing in real time, the whole algorithm deployment chain …

Kaydet Alıntı yap Alıntılanma sayısı: 5 İlgili makaleler 9 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] upv.es

Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations

JI Aliaga, E Dufrechou, P Ezzatti, ES Quintana-Ortí - Parallel Computing, 2019 - Elsevier

ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov
subspace-based methods. Its relevance for the solution of real problems has motivated …

Kaydet Alıntı yap Alıntılanma sayısı: 11 İlgili makaleler 6 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] proquest.com

Time-domain simulation of large electric power systems using domain-decomposition and parallel processing methods

P Aristidou - 2015 - search.proquest.com

Dynamic simulation studies are used to analyze the behavior of power systems after a
disturbance has occurred. Over the last decades, they have become indispensable to …

Kaydet Alıntı yap Alıntılanma sayısı: 15 İlgili makaleler 3 sürümün hepsi

[Free GPT-4]
[DeepSeek]

[PDF] d-nb.info

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs

H Anzt, M Kreutzer, E Ponce… - … Journal of High …, 2018 - journals.sagepub.com

In this paper, we present an optimized GPU implementation for the induced dimension
reduction algorithm. We improve data locality, combine it with an efficient sparse matrix …

Kaydet Alıntı yap Alıntılanma sayısı: 15 İlgili makaleler 8 sürümün hepsi

Uyarı oluştur

Alıntı yap

Gelişmiş arama

Kitaplığım'a kaydedildi

Performance evaluation of kernel fusion BLAS routines on the GPU: iterative solvers as case study

Optimization techniques for GPU programming

Optimizing CUDA code by kernel fusion: application on BLAS

Logca: A high-level performance model for hardware accelerators

GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems

Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units

Systematic fusion of CUDA kernels for iterative sparse linear system solvers

Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy

Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations

Time-domain simulation of large electric power systems using domain-decomposition and parallel processing methods

Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs