Enabling efficient multithreaded MPI communication through a library-based implementation of MPI endpoints
Modern high-speed interconnection networks are designed with capabilities to support
communication from multiple processor cores. The MPI endpoints extension has been …
communication from multiple processor cores. The MPI endpoints extension has been …
Multi-level load balancing with an integrated runtime approach
The recent trend of increasing numbers of cores per chip has resulted in vast amounts of on-
node parallelism. These high core counts result in hardware variability that introduces …
node parallelism. These high core counts result in hardware variability that introduces …
Enhancing MPI+ OpenMP task based applications for heterogeneous architectures with GPU Support
Heterogeneous supercomputers are widespread over HPC systems and programming
efficient applications on these architectures is a challenge. Task-based programming …
efficient applications on these architectures is a challenge. Task-based programming …
Introducing kernel-level page reuse for high performance computing
Due to computer architecture evolution, more and more HPC applications have to include
thread-based parallelism and take care of memory consumption. Such evolutions require …
thread-based parallelism and take care of memory consumption. Such evolutions require …
Towards achieving transparent malleability thanks to mpi process virtualization
Abstract The field of High-Performance Computing is rapidly evolving, driven by the race for
computing power and the emergence of new architectures. Despite these changes, the …
computing power and the emergence of new architectures. Despite these changes, the …
A Distributed Version of Syrup
A portfolio SAT solver has to share clauses in order to be efficient. In a distributed
environment, such sharing implies additional problems: more information has to be …
environment, such sharing implies additional problems: more information has to be …
Introducing task-containers as an alternative to runtime-stacking
The advent of many-core architectures poses new challenges to the MPI programming
model which has been designed for distributed memory message passing. It is now clear …
model which has been designed for distributed memory message passing. It is now clear …
Thread-local storage extension to support thread-based MPI/openMP applications
With the advent of the multicore era, the architecture of supercomputers in HPC (High-
Performance Computing) is evolving to integrate larger computational nodes with an …
Performance Computing) is evolving to integrate larger computational nodes with an …
Hybrid parallel programming models for AMR neutron Monte-Carlo transport
D Dureau, G Poëtte - … MC 2013-Joint …, 2014 - sna-and-mc-2013-proceedings …
This paper deals with High Performance Computing (HPC) applied to neutron transport
theory on complex geometries, thanks to both an Adaptive Mesh Refinement (AMR) …
theory on complex geometries, thanks to both an Adaptive Mesh Refinement (AMR) …
A methodology for assessing computation/communication overlap of MPI nonblocking collectives
By allowing computation/communication overlap, MPI nonblocking collectives (NBC) are
supposed to improve application scalability and performance. However, it is known that to …
supposed to improve application scalability and performance. However, it is known that to …