The Scalasca performance toolset architecture

M Geimer, F Wolf, BJN Wylie… - Concurrency and …, 2010 - Wiley Online Library
Scalasca is a performance toolset that has been specifically designed to analyze parallel
application execution behavior on large‐scale systems with many thousands of processors …

Combing the communication hairball: Visualizing parallel execution traces using logical time

KE Isaacs, PT Bremer, I Jusufi… - IEEE transactions on …, 2014 - ieeexplore.ieee.org
With the continuous rise in complexity of modern supercomputers, optimizing the
performance of large-scale parallel programs is becoming increasingly challenging …

MUST: A scalable approach to runtime error detection in MPI programs

T Hilbrich, M Schulz, BR de Supinski… - Tools for High …, 2010 - Springer
Abstract The Message-Passing Interface (MPI) is large and complex. Therefore,
programming MPI is error prone. Several MPI runtime correctness tools address classes of …

Pemogen: Automatic adaptive performance modeling during program runtime

A Bhattacharyya, T Hoefler - … of the 23rd international conference on …, 2014 - dl.acm.org
Traditional means of gathering performance data are tracing, which is limited by the
available storage, and profiling, which has limited accuracy. Performance modeling is often …

An autonomic performance environment for exascale

KA Huck, A Porterfield, N Chaimov, H Kaiser… - Supercomputing …, 2015 - superfri.org
Exascale systems will require new approaches to performance observation, analysis, and
runtime decision-making to optimize for performance and efficiency. The standard" first …

Lessons learned from a performance analysis and optimization of a multiscale cellular simulation

M Clascà, M Garcia-Gasulla, A Montagud… - Proceedings of the …, 2023 - dl.acm.org
This work presents a comprehensive performance analysis and optimization of a multiscale
agent-based cellular simulation. The optimizations applied are guided by detailed …

A scalable tool architecture for diagnosing wait states in massively parallel applications

M Geimer, F Wolf, BJN Wylie, B Mohr - Parallel Computing, 2009 - Elsevier
When scaling message-passing applications to thousands of processors, their performance
is often affected by wait states that occur when processes fail to reach synchronization points …

SOMA: Observability, monitoring, and in situ analytics for exascale applications

D Yokelson, O Lappi, S Ramesh… - Concurrency and …, 2024 - Wiley Online Library
With the rise of exascale systems and large, data‐centric workflows, the need to observe
and analyze high performance computing (HPC) applications during their execution is …

Using compiler techniques to improve automatic performance modeling

A Bhattacharyya, G Kwasniewski… - … Conference on Parallel …, 2015 - ieeexplore.ieee.org
Performance modeling can be utilized in a number of scenarios, starting from finding
performance bugs to the scalability study of applications. Existing dynamic and static …

Interpreting performance data across intuitive domains

M Schulz, JA Levine, PT Bremer… - 2011 International …, 2011 - ieeexplore.ieee.org
To exploit the capabilities of current and future systems, developers must understand the
interplay between on-node performance, domain decomposition, and an application's …