Turnitin
降AI改写
早检测系统
早降重系统
Turnitin-UK版
万方检测-期刊版
维普编辑部版
Grammarly检测
Paperpass检测
checkpass检测
PaperYY检测
Optimization techniques for GPU programming
In the past decade, Graphics Processing Units have played an important role in the field of
high-performance computing and they still advance new fields such as IoT, autonomous …
high-performance computing and they still advance new fields such as IoT, autonomous …
Optimizing CUDA code by kernel fusion: application on BLAS
Contemporary GPUs have significantly higher arithmetic throughput than a memory
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …
throughput. Hence, many GPU kernels are memory bound and cannot exploit arithmetic …
Logca: A high-level performance model for hardware accelerators
With the end of Dennard scaling, architects have increasingly turned to special-purpose
hardware accelerators to improve the performance and energy efficiency for some …
hardware accelerators to improve the performance and energy efficiency for some …
GHOST: building blocks for high performance sparse linear algebra on heterogeneous systems
While many of the architectural details of future exascale-class high performance computer
systems are still a matter of intense research, there appears to be a general consensus that …
systems are still a matter of intense research, there appears to be a general consensus that …
Pipelined approach to fused kernels for optimization of machine learning workloads on graphical processing units
(57) ABSTRACT A method for optimization of machine learning (ML) work loads on a
graphics processor unit (GPU). The method includes identifying a computation having a …
graphics processor unit (GPU). The method includes identifying a computation having a …
Systematic fusion of CUDA kernels for iterative sparse linear system solvers
We introduce a systematic analysis in order to fuse CUDA kernels arising in efficient iterative
methods for the solution of sparse linear systems. Our procedure characterizes the input and …
methods for the solution of sparse linear systems. Our procedure characterizes the input and …
Real-time optical flow processing on embedded GPU: an hardware-aware algorithm to implementation strategy
Determining the optical flow of a video is a compute-intensive task essential for computer
vision. For achieving this processing in real time, the whole algorithm deployment chain …
vision. For achieving this processing in real time, the whole algorithm deployment chain …
Accelerating the task/data-parallel version of ILUPACK's BiCG in multi-CPU/GPU configurations
ILUPACK is a valuable tool for the solution of sparse linear systems via iterative Krylov
subspace-based methods. Its relevance for the solution of real problems has motivated …
subspace-based methods. Its relevance for the solution of real problems has motivated …
Time-domain simulation of large electric power systems using domain-decomposition and parallel processing methods
P Aristidou - 2015 - search.proquest.com
Dynamic simulation studies are used to analyze the behavior of power systems after a
disturbance has occurred. Over the last decades, they have become indispensable to …
disturbance has occurred. Over the last decades, they have become indispensable to …
Optimization and performance evaluation of the IDR iterative Krylov solver on GPUs
In this paper, we present an optimized GPU implementation for the induced dimension
reduction algorithm. We improve data locality, combine it with an efficient sparse matrix …
reduction algorithm. We improve data locality, combine it with an efficient sparse matrix …