Trends in data locality abstractions for HPC systems
The cost of data movement has always been an important concern in high performance
computing (HPC) systems. It has now become the dominant factor in terms of both energy …
computing (HPC) systems. It has now become the dominant factor in terms of both energy …
An autonomic performance environment for exascale
Exascale systems will require new approaches to performance observation, analysis, and
runtime decision-making to optimize for performance and efficiency. The standard" first …
runtime decision-making to optimize for performance and efficiency. The standard" first …
Node aware sparse matrix–vector multiplication
The sparse matrix–vector multiply (SpMV) operation is a key computational kernel in many
simulations and linear solvers. The large communication requirements associated with a …
simulations and linear solvers. The large communication requirements associated with a …
Topology-aware resource management for HPC applications
Y Georgiou, E Jeannot, G Mercier… - Proceedings of the 18th …, 2017 - dl.acm.org
The Resource and Job Management System (RJMS) is a crucial system software part of the
HPC stack. It is responsible for efficiently delivering computing power to applications in …
HPC stack. It is responsible for efficiently delivering computing power to applications in …
Automatically distributing Eulerian and hybrid fluid simulations in the cloud
Distributing a simulation across many machines can drastically speed up computations and
increase detail. The computing cloud provides tremendous computing resources, but weak …
increase detail. The computing cloud provides tremendous computing resources, but weak …
Process map** on any topology with TopoMatch
E Jeannot - Journal of Parallel and Distributed Computing, 2022 - Elsevier
Process map** (or process placement) is a useful algorithmic technique to optimize the
way applications are launched and executed onto a parallel machine. By taking into account …
way applications are launched and executed onto a parallel machine. By taking into account …
Topology and affinity aware hierarchical and distributed load-balancing in Charm++
The evolution of massively parallel supercomputers make palpable two issues in particular:
the load imbalance and the poor management of data locality in applications. Thus, with the …
the load imbalance and the poor management of data locality in applications. Thus, with the …
Architecture-aware graph repartitioning for data-intensive scientific computing
A Zheng, A Labrinidis… - 2014 IEEE International …, 2014 - ieeexplore.ieee.org
Graph partitioning and repartitioning have been widely used by scientists to parallelize
compute-and dataintensive simulations. However, existing graph (re) partitioning algorithms …
compute-and dataintensive simulations. However, existing graph (re) partitioning algorithms …
PackStealLB: A scalable distributed load balancer based on work stealing and workload discretization
The scalability of high-performance, parallel iterative applications is directly affected by how
well they use the available computing resources. These applications are subject to load …
well they use the available computing resources. These applications are subject to load …
A memory congestion-aware MPI process placement for modern NUMA systems
MPI process placement is an important step to achieve scalable performance on modern
non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has …
non-uniform memory access (NUMA) systems. A recent study on NUMA architectures has …