- Academic Search

Z Du, J Li, Y Wang, X Li, G Tan… - … Conference for High …, 2022 - ieeexplore.ieee.org

Sparse Matrix-Vector multiplication (SpMV) is an essential computational kernel in many
application scenarios. Tens of sparse matrix formats and implementations have been …

Save Cite Cited by 28 Related articles All 7 versions Free GPT-4

[Free GPT-4]

[PDF] acm.org Full View

VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming

Z Qi, J Yao, C Zhang, M Yu, Z Yang… - ACM Transactions on …, 2014 - dl.acm.org

To achieve efficient resource management on a graphics processing unit (GPU), there is a
demand to develop a framework for scheduling virtualized resources in cloud gaming. In this …

Save Cite Cited by 115 Related articles All 6 versions Free GPT-4

OpenCL task partitioning in the presence of GPU contention

D Grewe, Z Wang, MFP O'Boyle - … Workshop, LCPC 2013, San Jose, CA …, 2014 - Springer

Heterogeneous multi-and many-core systems are increasingly prevalent in the desktop and
mobile domains. On these systems it is common for programs to compete with co-running …

Save Cite Cited by 80 Related articles All 4 versions Free GPT-4

[Free GPT-4]

[PDF] nsf.gov

Cloud FPGA cartography using PCIe contention

S Tian, I Giechaskiel, W **ong… - 2021 IEEE 29th Annual …, 2021 - ieeexplore.ieee.org

Public cloud infrastructures allow for easy, on-demand access to FPGA resources. However,
the low-level, direct access to the FPGA hardware exposes the infrastructure providers to …

Save Cite Cited by 22 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[PDF] psu.edu

PSkel: A stencil programming framework for CPU‐GPU systems

AD Pereira, L Ramos, LFW Góes - … and Computation: Practice …, 2015 - Wiley Online Library

Summary The use of Graphics Processing Units (GPUs) for high‐performance computing
has gained growing momentum in recent years. Unfortunately, GPU‐programming platforms …

Save Cite Cited by 51 Related articles All 8 versions Free GPT-4

[Free GPT-4]

[HTML] sciencedirect.com

[HTML][HTML] Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

MJ Hallock, JE Stone, E Roberts, C Fry… - Parallel computing, 2014 - Elsevier

Simulation of in vivo cellular processes with the reaction–diffusion master equation (RDME)
is a computationally expensive task. Our previous software enabled simulation of …

Save Cite Cited by 59 Related articles All 13 versions Free GPT-4

[Free GPT-4]

[PDF] hal.science

A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures

T Geng, M Amaris, S Zuckerman, A Goldman… - International Journal of …, 2022 - Springer

While heterogeneous architectures are increasing popular with High Performance
Computing systems, their effectiveness depends on how efficient the scheduler is at …

Save Cite Cited by 11 Related articles All 16 versions Free GPT-4

[Free GPT-4]

[PDF] ethz.ch

A PCIe congestion-aware performance model for densely populated accelerator servers

M Martinasso, G Kwasniewski, SR Alam… - SC'16: Proceedings …, 2016 - ieeexplore.ieee.org

MeteoSwiss, the Swiss national weather forecast institute, has selected densely populated
accelerator servers as their primary system to compute weather forecast simulation. Servers …

Save Cite Cited by 40 Related articles All 30 versions Free GPT-4

[Free GPT-4]

[PDF] escholarship.org

Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

M Sourouri, SB Baden, X Cai - International Journal of Parallel …, 2017 - Springer

We present a new compiler framework for truly heterogeneous 3D stencil computation on
GPU clusters. Our framework consists of a simple directive-based programming model and a …

Save Cite Cited by 32 Related articles All 9 versions Free GPT-4

[Free GPT-4]

[PDF] archive.org

Forma: A DSL for image processing applications to target GPUs and multi-core CPUs

M Ravishankar, J Holewinski, V Grover - … of the 8th Workshop on General …, 2015 - dl.acm.org

As architectures evolve, optimization techniques to obtain good performance evolve as well.
Using low-level programming languages like C/C++ typically results in architecture-specific …

Save Cite Cited by 40 Related articles All 2 versions Free GPT-4

Create alert

Cite

Advanced search

Saved to My library

PARTANS: An autotuning framework for stencil computation on multi-GPU systems

Alphasparse: Generating high performance spmv codes directly from sparse matrices

VGRIS: Virtualized GPU resource isolation and scheduling in cloud gaming

OpenCL task partitioning in the presence of GPU contention

Cloud FPGA cartography using PCIe contention

PSkel: A stencil programming framework for CPU‐GPU systems

[HTML][HTML] Simulation of reaction diffusion processes over biologically relevant size and time scales using multi-GPU workstations

A profile-based ai-assisted dynamic scheduling approach for heterogeneous architectures

A PCIe congestion-aware performance model for densely populated accelerator servers

Panda: A Compiler Framework for Concurrent CPUGPU Execution of 3D Stencil Computations on GPU-accelerated Supercomputers

Forma: A DSL for image processing applications to target GPUs and multi-core CPUs