The TAU parallel performance system

SS Shende, AD Malony - The International Journal of High …, 2006 - journals.sagepub.com
The ability of performance technology to keep pace with the growing complexity of parallel
and distributed systems depends on robust performance frameworks that can at once …

Gravitational torques in spiral galaxies: Gas accretion as a driving mechanism of galactic evolution

DL Block, F Bournaud, F Combes, I Puerari… - Astronomy & …, 2002 - aanda.org
The distribution of gravitational torques and bar strengths in the local Universe is derived
from a detailed study of 163 galaxies observed in the near-infrared. The results are …

Reducing the overhead of direct application instrumentation using prior static analysis

J Mußler, D Lorenz, F Wolf - European Conference on Parallel Processing, 2011 - Springer
Preparing performance measurements of HPC applications is usually a tradeoff between
accuracy and granularity of the measured data. When using direct instrumentation, that is …

Advances in the TAU performance system

AD Malony, S Shende, R Bell, K Li, L Li… - Performance Analysis and …, 2004 - Springer
To address the increasing complexity in parallel and distributed systems and software,
advances in performance technology towards more robust tools and broader, more portable …

[PDF][PDF] Phase-Based Parallel Performance Profiling.

AD Malony, S Shende, A Morris - PARCO, 2005 - researchgate.net
Parallel scientific applications are designed based on structural, logical, and numerical
models of computation and correctness. When studying the performance of these …

Parallel performance wizard: A performance analysis tool for partitioned global-address-space programming

HH Su, M Billingsley, AD George - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Given the complexity of parallel programs, developers often must rely on performance
analysis tools to help them improve the performance of their code. While many tools support …

Reconciling sampling and direct instrumentation for unintrusive call-path profiling of MPI programs

T Gamblin, M Schulz, BR de Supinski… - … Parallel & Distributed …, 2011 - ieeexplore.ieee.org
We can profile the performance behavior of parallel programs at the level of individual call
paths through sampling or direct instrumentation. While we can easily control measurement …

An efficient multi-level trace toolkit for multi-threaded applications

V Danjean, R Namyst, PA Wacrenier - Euro-Par 2005 Parallel Processing …, 2005 - Springer
Nowadays, observing and understanding the behavior and performance of a multi-threaded
application is a nontrivial task, especially within a complex multi-threaded environment such …

A scalable approach to MPI application performance analysis

S Moore, F Wolf, J Dongarra, S Shende… - Recent Advances in …, 2005 - Springer
A scalable approach to performance analysis of MPI applications is presented that includes
automated source code instrumentation, low overhead generation of profile and trace data …

Automatic analysis of inefficiency patterns in parallel applications

F Wolf, B Mohr, J Dongarra… - … and Computation: Practice …, 2007 - Wiley Online Library
Event tracing is a powerful method for analyzing the performance behavior of parallel
applications. Because event traces record the temporal and spatial relationships between …