Flux: A next-generation resource management framework for large HPC centers
DH Ahn, J Garlick, M Grondona, D Lipari… - 2014 43rd …, 2014 - ieeexplore.ieee.org
Resource and job management software is crucial to High Performance Computing (HPC)
for efficient application execution. However, current systems and approaches can no longer …
for efficient application execution. However, current systems and approaches can no longer …
An implementation and evaluation of the MPI 3.0 one‐sided communication interface
Summary The Message Passing Interface (MPI) 3.0 standard includes a significant revision
to MPI's remote memory access (RMA) interface, which provides support for one‐sided …
to MPI's remote memory access (RMA) interface, which provides support for one‐sided …
Programming for exascale computers
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …
parallel programming models that are appropriate for such systems and the challenges that …
Next generation job management systems for extreme-scale ensemble computing
With the exponential growth of supercomputers in parallelism, applications are growing
more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such …
more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such …
EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications
Scientists from many different fields have been develo** Bulk‐Synchronous MPI
applications to simulate and study a wide variety of scientific phenomena. Since failure rates …
applications to simulate and study a wide variety of scientific phenomena. Since failure rates …
Twister2: Design of a big data toolkit
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …
How I learned to stop worrying about user-visible endpoints and love MPI
MPI+ threads is gaining prominence as an alternative to the traditional" MPI everywhere"
model in order to better handle the disproportionate increase in the number of cores …
model in order to better handle the disproportionate increase in the number of cores …
A performance analysis and optimization of PMIx-based HPC software stacks
Process management libraries and runtime environments serve an important role in the
HPC application lifecycle. This work provides a roadmap for implementing a high …
HPC application lifecycle. This work provides a roadmap for implementing a high …
Process-in-process: techniques for practical address-space sharing
The two most common parallel execution models for many-core CPUs today are
multiprocess (eg, MPI) and multithread (eg, OpenMP). The multiprocess model allows each …
multiprocess (eg, MPI) and multithread (eg, OpenMP). The multiprocess model allows each …
Design and implementation for checkpointing of distributed resources using process-level virtualization
System-level checkpoint-restart is a critical technology for long-running jobs in high-
performance computing. Yet, only two approaches to checkpointing MPI applications …
performance computing. Yet, only two approaches to checkpointing MPI applications …