Alphasparse: Generating high performance spmv codes directly from sparse matrices

Z Du, J Li, Y Wang, X Li, G Tan… - … Conference for High …, 2022 - ieeexplore.ieee.org
Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …

VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming

Z Qi, J Yao, C Zhang, M Yu, Z Yang… - ACM Transactions on …, 2014 - dl.acm.org
To achieve efficient resource management on a graphics processing unit (GPU), there is a
demand to develop a framework for scheduling virtualized resources in cloud gaming. In this …

OpenCL task partitioning in the presence of GPU contention

D Grewe, Z Wang, MFP O'Boyle - … Workshop, LCPC 2013, San Jose, CA …, 2014 - Springer
Heterogeneous multi-and many-core systems are increasingly prevalent in the desktop and
mobile domains. On these systems it is common for programs to compete with co-running …

Cloud FPGA cartography using PCIe contention

S Tian, I Giechaskiel, W **ong… - 2021 IEEE 29th Annual …, 2021 - ieeexplore.ieee.org
Public cloud infrastructures allow for easy, on-demand access to FPGA resources. However,
the low-level, direct access to the FPGA hardware exposes the infrastructure providers to …

PSkel: A stencil programming framework for CPU‐GPU systems

AD Pereira, L Ramos, LFW Góes - … and Computation: Practice …, 2015 - Wiley Online Library
Summary The use of Graphics Processing Units (GPUs) for high‐performance computing
has gained growing momentum in recent years. Unfortunately, GPU‐programming platforms …

[HTML][HTML] Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

MJ Hallock, JE Stone, E Roberts, C Fry… - Parallel computing, 2014 - Elsevier
Simulation of in vivo cellular processes with the reaction–diffusion master equation (RDME)
is a computationally expensive task. Our previous software enabled simulation of …

A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures

T Geng, M Amaris, S Zuckerman, A Goldman… - International Journal of …, 2022 - Springer
While heterogeneous architectures are increasing popular with High Performance
Computing systems, their effectiveness depends on how efficient the scheduler is at …

A PCIe congestion-aware performance model for densely populated accelerator servers

M Martinasso, G Kwasniewski, SR Alam… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org
MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated
accelerator servers as their primary system to compute weather forecast simulation. Servers …

Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

M Sourouri, SB Baden, X Cai - International Journal of Parallel …, 2017 - Springer
We present a new compiler framework for truly heterogeneous 3D stencil computation on
GPU clusters. Our framework consists of a simple directive-based programming model and a …

Forma: A DSL for image processing applications to target GPUs and multi-core CPUs

M Ravishankar, J Holewinski, V Grover - … of the 8th Workshop on General …, 2015 - dl.acm.org
As architectures evolve, optimization techniques to obtain good performance evolve as well.
Using low-level programming languages like C/C++ typically results in architecture-specific …