OpenUH: An optimizing, portable OpenMP compiler
OpenMP has gained wide popularity as an API for parallel programming on shared memory
and distributed shared memory platforms. Despite its broad availability, there remains a …
and distributed shared memory platforms. Despite its broad availability, there remains a …
[PDF][PDF] A portable C compiler for OpenMP V. 2.0
This paper presents an overview of OMPi, a portable implementation of the OpenMP API for
C, adhering to the recently released version 2.0 of the standard. OMPi is a C-to-C translator …
C, adhering to the recently released version 2.0 of the standard. OMPi is a C-to-C translator …
Scheduling algorithms for effective thread pairing on hybrid multiprocessors
RL McGregor, CD Antonopoulos… - 19th IEEE …, 2005 - ieeexplore.ieee.org
With the latest high-end computing nodes combining shared-memory multiprocessing with
hardware multithreading, new scheduling policies are necessary for workloads consisting of …
hardware multithreading, new scheduling policies are necessary for workloads consisting of …
Communication and optimization aspects of parallel programming models on hybrid architectures
R Rabenseifner, G Wellein - The International Journal of …, 2003 - journals.sagepub.com
Most HPC systems are clusters of shared memory nodes. Parallel programming must
combine the distributed memory parallelization on the node interconnect with the shared …
combine the distributed memory parallelization on the node interconnect with the shared …
Running OpenMP applications efficiently on an everything-shared SDSM
Summary form only given. Traditional software distributed shared memory (SDSM) systems
modify the semantics of a real hardware shared memory system by relaxing the coherence …
modify the semantics of a real hardware shared memory system by relaxing the coherence …
An evaluation of OpenMP on current and emerging multithreaded/multicore processors
Multiprocessors based on simultaneous multithreaded (SMT) or multicore (CMP) processors
are continuing to gain a significant share in both high-performance and mainstream …
are continuing to gain a significant share in both high-performance and mainstream …
A comparison of locality transformations for irregular codes
H Han, CW Tseng - … Workshop on Languages, Compilers, and Run-Time …, 2000 - Springer
Researchers have proposed several data and computation transformations to improve
locality in irregular scientific codes. We ex-perimentally compare their performance and …
locality in irregular scientific codes. We ex-perimentally compare their performance and …
共有メモリマルチプロセッサシステム上での粗粒度タスク並列処理
笠原博徳, 小幡元樹, 石坂一久 - 情報処理学会論文誌, 2001 - ipsj.ixsq.nii.ac.jp
論文抄録 本論文では, 共有メモリ型マルチプロセッサシステム上での粗粒度タスク並列処理のワン
タイム・シングルレベルスレッド生成を用いた実現方式について提案する. 粗粒度タスク並列処理は …
タイム・シングルレベルスレッド生成を用いた実現方式について提案する. 粗粒度タスク並列処理は …
Realistic workload scheduling policies for taming the memory bandwidth bottleneck of smps
In this paper we reformulate the thread scheduling problem on multiprogrammed SMPs.
Scheduling algorithms usually attempt to maximize performance of memory intensive …
Scheduling algorithms usually attempt to maximize performance of memory intensive …
Achieving performance under OpenMP on ccNUMA and software distributed shared memory systems
B Chapman, F Bregier, A Patil… - Concurrency and …, 2002 - Wiley Online Library
OpenMP is emerging as a viable high‐level programming model for shared memory parallel
systems. It was conceived to enable easy, portable application development on this range of …
systems. It was conceived to enable easy, portable application development on this range of …