Automated scheduling algorithm selection and chunk parameter calculation in OpenMP

A Mohammed, JHM Korndörfer… - … on Parallel and …, 2022 - ieeexplore.ieee.org
Increasing node and cores-per-node counts in supercomputers render scheduling and load
balancing critical for exploiting parallelism. OpenMP applications can achieve high …

A probabilistic machine learning approach to scheduling parallel loops with Bayesian optimization

K Kim, Y Kim, S Park - IEEE Transactions on Parallel and …, 2020 - ieeexplore.ieee.org
This article proposes Bayesian optimization augmented factoring self-scheduling (BO FSS),
a new parallel loop scheduling strategy. BO FSS is an automatic tuning variant of the …

COWS for High Performance: Cost Aware Work Stealing for Irregular Parallel Loop

P Mishra, VK Nandivada - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallel libraries such as OpenMP distribute the iterations of parallel-for-loops among the
threads, using a programmer-specified scheduling policy. While the existing scheduling …

A NUMA-aware version of an adaptive self-scheduling loop scheduler

JD Booth, P Lane - ACM Transactions on Architecture and Code …, 2024 - dl.acm.org
Parallelizing code in a shared-memory environment is commonly done utilizing loop
scheduling (LS) in a fork-join manner as in OpenMP. This manner of parallelization is …

An adaptive self‐scheduling loop scheduler

J Dennis Booth, P Allen Lane - Concurrency and Computation …, 2022 - Wiley Online Library
Many shared‐memory parallel irregular applications, such as sparse linear algebra and
graph algorithms, depend on efficient loop scheduling (LS) in a fork‐join manner despite …

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

V Freitas, LL Pilla, AL Santana, M Castro… - Journal of Parallel and …, 2021 - Elsevier
The scalability of high-performance, parallel iterative applications is directly affected by how
well they use the available computing resources. These applications are subject to load …

Chunking loops with non-uniform workloads

IK Prabhu, VK Nandivada - Proceedings of the 34th ACM International …, 2020 - dl.acm.org
Task-parallel languages such as X10 implement dynamic lightweight task-parallel execution
model, where programmers are encouraged to express the ideal parallelism in the program …

Automated Scheduling Algorithm Selection in OpenMP

FM Ciorba, A Mohammed… - … on Parallel and …, 2023 - ieeexplore.ieee.org
Scientific and data analysis applications are increasingly complex, with evolving
computational and memory requirements during execution. Conversely, modern high …

[책][B] A case for simple, low-cost autotuning heuristics for efficient performance of irregular algorithms

PA Lane - 2022 - search.proquest.com
Irregular algorithms (ie, algorithms that make irregular memory accesses) are notorious for
being difficult to optimize on parallel architectures due to load imbalance in the number of …

ARTful: A model for user‐defined schedulers targeting multiple high‐performance computing runtime systems

A Santana, V Freitas, M Castro, LL Pilla… - Software: Practice …, 2021 - Wiley Online Library
Global schedulers are components in parallel runtime libraries that distribute the
application's workload across physical resources. More often than not, applications …