The TAU parallel performance system
The ability of performance technology to keep pace with the growing complexity of parallel
and distributed systems depends on robust performance frameworks that can at once …
and distributed systems depends on robust performance frameworks that can at once …
Gravitational torques in spiral galaxies: Gas accretion as a driving mechanism of galactic evolution
The distribution of gravitational torques and bar strengths in the local Universe is derived
from a detailed study of 163 galaxies observed in the near-infrared. The results are …
from a detailed study of 163 galaxies observed in the near-infrared. The results are …
Reducing the overhead of direct application instrumentation using prior static analysis
J Mußler, D Lorenz, F Wolf - European Conference on Parallel Processing, 2011 - Springer
Preparing performance measurements of HPC applications is usually a tradeoff between
accuracy and granularity of the measured data. When using direct instrumentation, that is …
accuracy and granularity of the measured data. When using direct instrumentation, that is …
Advances in the TAU performance system
To address the increasing complexity in parallel and distributed systems and software,
advances in performance technology towards more robust tools and broader, more portable …
advances in performance technology towards more robust tools and broader, more portable …
[PDF][PDF] Phase-Based Parallel Performance Profiling.
Parallel scientific applications are designed based on structural, logical, and numerical
models of computation and correctness. When studying the performance of these …
models of computation and correctness. When studying the performance of these …
Parallel performance wizard: A performance analysis tool for partitioned global-address-space programming
HH Su, M Billingsley, AD George - 2008 IEEE International …, 2008 - ieeexplore.ieee.org
Given the complexity of parallel programs, developers often must rely on performance
analysis tools to help them improve the performance of their code. While many tools support …
analysis tools to help them improve the performance of their code. While many tools support …
Reconciling sampling and direct instrumentation for unintrusive call-path profiling of MPI programs
We can profile the performance behavior of parallel programs at the level of individual call
paths through sampling or direct instrumentation. While we can easily control measurement …
paths through sampling or direct instrumentation. While we can easily control measurement …
An efficient multi-level trace toolkit for multi-threaded applications
V Danjean, R Namyst, PA Wacrenier - Euro-Par 2005 Parallel Processing …, 2005 - Springer
Nowadays, observing and understanding the behavior and performance of a multi-threaded
application is a nontrivial task, especially within a complex multi-threaded environment such …
application is a nontrivial task, especially within a complex multi-threaded environment such …
A scalable approach to MPI application performance analysis
A scalable approach to performance analysis of MPI applications is presented that includes
automated source code instrumentation, low overhead generation of profile and trace data …
automated source code instrumentation, low overhead generation of profile and trace data …
Automatic analysis of inefficiency patterns in parallel applications
Event tracing is a powerful method for analyzing the performance behavior of parallel
applications. Because event traces record the temporal and spatial relationships between …
applications. Because event traces record the temporal and spatial relationships between …