Trends in data locality abstractions for HPC systems

D Unat, A Dubey, T Hoefler, J Shalf… - … on Parallel and …, 2017 - ieeexplore.ieee.org
The cost of data movement has always been an important concern in high performance
computing (HPC) systems. It has now become the dominant factor in terms of both energy …

An autonomic performance environment for exascale

KA Huck, A Porterfield, N Chaimov, H Kaiser… - Supercomputing …, 2015 - superfri.org
Exascale systems will require new approaches to performance observation, analysis, and
runtime decision-making to optimize for performance and efficiency. The standard" first …

Node aware sparse matrix–vector multiplication

A Bienz, WD Gropp, LN Olson - Journal of Parallel and Distributed …, 2019 - Elsevier
The sparse matrix–vector multiply (SpMV) operation is a key computational kernel in many
simulations and linear solvers. The large communication requirements associated with a …

Topology-aware resource management for HPC applications

Y Georgiou, E Jeannot, G Mercier… - Proceedings of the 18th …, 2017 - dl.acm.org
The Resource and Job Management System (RJMS) is a crucial system software part of the
HPC stack. It is responsible for efficiently delivering computing power to applications in …

Automatically distributing Eulerian and hybrid fluid simulations in the cloud

O Mashayekhi, C Shah, H Qu, A Lim… - ACM Transactions on …, 2018 - dl.acm.org
Distributing a simulation across many machines can drastically speed up computations and
increase detail. The computing cloud provides tremendous computing resources, but weak …

Process map** on any topology with TopoMatch

E Jeannot - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Process map** (or process placement) is a useful algorithmic technique to optimize the
way applications are launched and executed onto a parallel machine. By taking into account …

Topology and affinity aware hierarchical and distributed load-balancing in Charm++

E Jeannot, G Mercier, F Tessier - 2016 First International …, 2016 - ieeexplore.ieee.org
The evolution of massively parallel supercomputers make palpable two issues in particular:
the load imbalance and the poor management of data locality in applications. Thus, with the …

Architecture-aware graph repartitioning for data-intensive scientific computing

A Zheng, A Labrinidis… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
Graph partitioning and repartitioning have been widely used by scientists to parallelize
compute-and dataintensive simulations. However, existing graph (re) partitioning algorithms …

PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization

V Freitas, LL Pilla, AL Santana, M Castro… - Journal of Parallel and …, 2021 - Elsevier
The scalability of high-performance, parallel iterative applications is directly affected by how
well they use the available computing resources. These applications are subject to load …

A memory congestion-aware MPI process placement for modern NUMA systems

M Agung, MA Amrizal, K Komatsu… - 2017 IEEE 24th …, 2017 - ieeexplore.ieee.org
MPI process placement is an important step to achieve scalable performance on modern
non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has …