MIMD programs execution support on SIMD machines: a holistic survey

D Mustafa, R Alkhasawneh, F Obeidat… - IEEE Access, 2024 - ieeexplore.ieee.org
The Single Instruction Multiple Data (SIMD) architecture, supported by various high-
performance computing platforms, efficiently utilizes data-level parallelism. The SIMD model …

IRIS: A portable runtime system exploiting multiple heterogeneous programming systems

J Kim, S Lee, B Johnston… - 2021 IEEE High …, 2021 - ieeexplore.ieee.org
Across embedded, mobile, enterprise, and high performance computing systems, computer
architectures are becoming more heterogeneous and complex. This complexity is causing a …

CEDR: A compiler-integrated, extensible DSSoC runtime

J Mack, S Hassan, N Kumbhare… - ACM Transactions on …, 2023 - dl.acm.org
In this work, we present a C ompiler-integrated, E xtensible D omain Specific System on
Chip R untime (CEDR) ecosystem to facilitate research toward addressing the challenges of …

Faster and cheaper: Parallelizing large-scale matrix factorization on GPUs

W Tan, L Cao, L Fong - Proceedings of the 25th ACM International …, 2016 - dl.acm.org
Matrix factorization (MF) is used by many popular algorithms such as collaborative filtering.
GPU with massive cores and high memory bandwidth sheds light on accelerating MF much …

Pagoda: Fine-grained gpu resource virtualization for narrow tasks

TT Yeh, A Sabne, P Sakdhnagool, R Eigenmann… - ACM SIGPLAN …, 2017 - dl.acm.org
Massively multithreaded GPUs achieve high throughput by running thousands of threads in
parallel. To fully utilize the hardware, workloads spawn work to the GPU in bulk by launching …

IRIS: A performance-portable framework for cross-platform heterogeneous computing

J Kim, S Lee, B Johnston… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
From edge to exascale, computer architectures are becoming more heterogeneous and
complex. The systems typically have fat nodes, with multicore CPUs and multiple hardware …

Compiler techniques for massively scalable implicit task parallelism

TG Armstrong, JM Wozniak, M Wilde… - SC'14: Proceedings of …, 2014 - ieeexplore.ieee.org
Swift/T is a high-level language for writing concise, deterministic scripts that compose serial
or parallel codes implemented in lower-level programming models into large-scale parallel …

Scheduling multi-tenant cloud workloads on accelerator-based systems

D Sengupta, A Goswami, K Schwan… - SC'14: Proceedings of …, 2014 - ieeexplore.ieee.org
Accelerator-based systems are making rapid inroads into becoming platforms of choice for
high end cloud services. There is a need therefore, to move from the current model in which …

Extreme-scale dynamic exploration of a distributed agent-based model with the EMEWS framework

J Ozik, NT Collier, JM Wozniak… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Agent-based models (ABMs) integrate the multiple scales of behavior and data to produce
higher order dynamic phenomena and are increasingly used in the study of important social …

Juggler: a dependence-aware task-based execution framework for GPUs

ME Belviranli, S Lee, JS Vetter, LN Bhuyan - Proceedings of the 23rd …, 2018 - dl.acm.org
Scientific applications with single instruction, multiple data (SIMD) computations show
considerable performance improvements when run on today's graphics processing units …