Flux: A next-generation resource management framework for large HPC centers

DH Ahn, J Garlick, M Grondona, D Lipari… - 2014 43rd …, 2014 - ieeexplore.ieee.org
Resource and job management software is crucial to High Performance Computing (HPC)
for efficient application execution. However, current systems and approaches can no longer …

An implementation and evaluation of the MPI 3.0 one‐sided communication interface

J Dinan, P Balaji, D Buntinas, D Goodell… - Concurrency and …, 2016 - Wiley Online Library
Summary The Message Passing Interface (MPI) 3.0 standard includes a significant revision
to MPI's remote memory access (RMA) interface, which provides support for one‐sided …

Programming for exascale computers

W Gropp, M Snir - Computing in Science & Engineering, 2013 - ieeexplore.ieee.org
Exascale systems will present programmers with many challenges. The authors review the
parallel programming models that are appropriate for such systems and the challenges that …

Next generation job management systems for extreme-scale ensemble computing

K Wang, X Zhou, H Chen, M Lang, I Raicu - Proceedings of the 23rd …, 2014 - dl.acm.org
With the exponential growth of supercomputers in parallelism, applications are growing
more diverse, including traditional large-scale HPC MPI jobs, and ensemble workloads such …

EReinit: Scalable and efficient fault‐tolerance for bulk‐synchronous MPI applications

S Chakraborty, I Laguna, M Emani… - Concurrency and …, 2020 - Wiley Online Library
Scientists from many different fields have been develo** Bulk‐Synchronous MPI
applications to simulate and study a wide variety of scientific phenomena. Since failure rates …

Twister2: Design of a big data toolkit

S Kamburugamuve, K Govindarajan… - Concurrency and …, 2020 - Wiley Online Library
Data‐driven applications are essential to handle the ever‐increasing volume, velocity, and
veracity of data generated by sources such as the Web and Internet of Things (IoT) devices …

How I learned to stop worrying about user-visible endpoints and love MPI

R Zambre, A Chandramowliswharan… - Proceedings of the 34th …, 2020 - dl.acm.org
MPI+ threads is gaining prominence as an alternative to the traditional" MPI everywhere"
model in order to better handle the disproportionate increase in the number of cores …

A performance analysis and optimization of PMIx-based HPC software stacks

AY Polyakov, BI Karasev, J Hursey, J Ladd… - Proceedings of the 26th …, 2019 - dl.acm.org
Process management libraries and runtime environments serve an important role in the
HPC application lifecycle. This work provides a roadmap for implementing a high …

Process-in-process: techniques for practical address-space sharing

A Hori, M Si, B Gerofi, M Takagi, J Dayal… - Proceedings of the 27th …, 2018 - dl.acm.org
The two most common parallel execution models for many-core CPUs today are
multiprocess (eg, MPI) and multithread (eg, OpenMP). The multiprocess model allows each …

Design and implementation for checkpointing of distributed resources using process-level virtualization

K Arya, R Garg, AY Polyakov… - 2016 ieee international …, 2016 - ieeexplore.ieee.org
System-level checkpoint-restart is a critical technology for long-running jobs in high-
performance computing. Yet, only two approaches to checkpointing MPI applications …