Alphasparse: Generating high performance spmv codes directly from sparse matrices
Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …
application scenarios. Tens of sparse matrix formats and implementations have been …
VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming
To achieve efficient resource management on a graphics processing unit (GPU), there is a
demand to develop a framework for scheduling virtualized resources in cloud gaming. In this …
demand to develop a framework for scheduling virtualized resources in cloud gaming. In this …
OpenCL task partitioning in the presence of GPU contention
Heterogeneous multi-and many-core systems are increasingly prevalent in the desktop and
mobile domains. On these systems it is common for programs to compete with co-running …
mobile domains. On these systems it is common for programs to compete with co-running …
Cloud FPGA cartography using PCIe contention
Public cloud infrastructures allow for easy, on-demand access to FPGA resources. However,
the low-level, direct access to the FPGA hardware exposes the infrastructure providers to …
the low-level, direct access to the FPGA hardware exposes the infrastructure providers to …
PSkel: A stencil programming framework for CPU‐GPU systems
Summary The use of Graphics Processing Units (GPUs) for high‐performance computing
has gained growing momentum in recent years. Unfortunately, GPU‐programming platforms …
has gained growing momentum in recent years. Unfortunately, GPU‐programming platforms …
[HTML][HTML] Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations
Simulation of in vivo cellular processes with the reaction–diffusion master equation (RDME)
is a computationally expensive task. Our previous software enabled simulation of …
is a computationally expensive task. Our previous software enabled simulation of …
A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures
While heterogeneous architectures are increasing popular with High Performance
Computing systems, their effectiveness depends on how efficient the scheduler is at …
Computing systems, their effectiveness depends on how efficient the scheduler is at …
A PCIe congestion-aware performance model for densely populated accelerator servers
MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated
accelerator servers as their primary system to compute weather forecast simulation. Servers …
accelerator servers as their primary system to compute weather forecast simulation. Servers …
Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers
We present a new compiler framework for truly heterogeneous 3D stencil computation on
GPU clusters. Our framework consists of a simple directive-based programming model and a …
GPU clusters. Our framework consists of a simple directive-based programming model and a …
Forma: A DSL for image processing applications to target GPUs and multi-core CPUs
As architectures evolve, optimization techniques to obtain good performance evolve as well.
Using low-level programming languages like C/C++ typically results in architecture-specific …
Using low-level programming languages like C/C++ typically results in architecture-specific …