The Scalasca performance toolset architecture
M Geimer, F Wolf, BJN Wylie… - Concurrency and …, 2010 - Wiley Online Library
Scalasca is a performance toolset that has been specifically designed to analyze parallel
application execution behavior on large‐scale systems with many thousands of processors …
application execution behavior on large‐scale systems with many thousands of processors …
Combing the communication hairball: Visualizing parallel execution traces using logical time
With the continuous rise in complexity of modern supercomputers, optimizing the
performance of large-scale parallel programs is becoming increasingly challenging …
performance of large-scale parallel programs is becoming increasingly challenging …
MUST: A scalable approach to runtime error detection in MPI programs
Abstract The Message-Passing Interface (MPI) is large and complex. Therefore,
programming MPI is error prone. Several MPI runtime correctness tools address classes of …
programming MPI is error prone. Several MPI runtime correctness tools address classes of …
Pemogen: Automatic adaptive performance modeling during program runtime
A Bhattacharyya, T Hoefler - … of the 23rd international conference on …, 2014 - dl.acm.org
Traditional means of gathering performance data are tracing, which is limited by the
available storage, and profiling, which has limited accuracy. Performance modeling is often …
available storage, and profiling, which has limited accuracy. Performance modeling is often …
An autonomic performance environment for exascale
Exascale systems will require new approaches to performance observation, analysis, and
runtime decision-making to optimize for performance and efficiency. The standard" first …
runtime decision-making to optimize for performance and efficiency. The standard" first …
Lessons learned from a performance analysis and optimization of a multiscale cellular simulation
This work presents a comprehensive performance analysis and optimization of a multiscale
agent-based cellular simulation. The optimizations applied are guided by detailed …
agent-based cellular simulation. The optimizations applied are guided by detailed …
A scalable tool architecture for diagnosing wait states in massively parallel applications
M Geimer, F Wolf, BJN Wylie, B Mohr - Parallel Computing, 2009 - Elsevier
When scaling message-passing applications to thousands of processors, their performance
is often affected by wait states that occur when processes fail to reach synchronization points …
is often affected by wait states that occur when processes fail to reach synchronization points …
SOMA: Observability, monitoring, and in situ analytics for exascale applications
With the rise of exascale systems and large, data‐centric workflows, the need to observe
and analyze high performance computing (HPC) applications during their execution is …
and analyze high performance computing (HPC) applications during their execution is …
Using compiler techniques to improve automatic performance modeling
A Bhattacharyya, G Kwasniewski… - … Conference on Parallel …, 2015 - ieeexplore.ieee.org
Performance modeling can be utilized in a number of scenarios, starting from finding
performance bugs to the scalability study of applications. Existing dynamic and static …
performance bugs to the scalability study of applications. Existing dynamic and static …
Interpreting performance data across intuitive domains
To exploit the capabilities of current and future systems, developers must understand the
interplay between on-node performance, domain decomposition, and an application's …
interplay between on-node performance, domain decomposition, and an application's …