Hierarchical roofline analysis: How to collect data using performance tools on intel cpus and nvidia gpus

C Yang - arxiv preprint arxiv:2009.02449, 2020 - arxiv.org
This paper surveys a range of methods to collect necessary performance data on Intel CPUs
and NVIDIA GPUs for hierarchical Roofline analysis. As of mid-2020, two vendor …

SOMA: Observability, monitoring, and in situ analytics for exascale applications

D Yokelson, O Lappi, S Ramesh… - Concurrency and …, 2024 - Wiley Online Library
With the rise of exascale systems and large, data‐centric workflows, the need to observe
and analyze high performance computing (HPC) applications during their execution is …

Hierarchical roofline performance analysis for deep learning applications

C Yang, Y Wang, T Kurth, S Farrell… - … Computing: Proceedings of …, 2021 - Springer
This paper presents a practical methodology for collecting performance data necessary to
conduct hierarchical Roofline analysis on NVIDIA GPUs. It discusses the extension of the …

High performance computing framework for tera-scale database search of mass spectrometry data

M Haseeb, F Saeed - Nature computational science, 2021 - nature.com
Database peptide search algorithms deduce peptides from mass spectrometry data. There
has been substantial effort in improving their computational efficiency to achieve larger and …

Time-based roofline for deep learning performance analysis

Y Wang, C Yang, S Farrell, Y Zhang… - 2020 IEEE/ACM …, 2020 - ieeexplore.ieee.org
Deep learning applications based on neural networks are generating considerable interest
in various fields due to their high accuracy. Such an application is usually very compute …

Broad performance measurement support for asynchronous multi-tasking with apex

KA Huck - 2022 IEEE/ACM 7th International Workshop on …, 2022 - ieeexplore.ieee.org
APEX (Autonomic Performance Environment for eXascale) is a performance measurement
library for distributed, asynchronous multitasking runtime systems. It provides support for …

TinyProf: Towards continuous performance introspection through scalable parallel I/O

K Fan, S Kesavan, S Petruzza… - ISC High Performance …, 2024 - ieeexplore.ieee.org
Performance profiling tools are crucial for HPC specialists to identify performance
bottlenecks in parallel codes at various levels of granularity (ie, across nodes, ranks, and …

GPU-acceleration of the distributed-memory database peptide search of mass spectrometry data

M Haseeb, F Saeed - Scientific Reports, 2023 - nature.com
Database peptide search is the primary computational technique for identifying peptides
from the mass spectrometry (MS) data. Graphical Processing Units (GPU) computing is now …

Ubiquitous performance analysis

D Boehme, P Aschwanden, O Pearce, K Weiss… - … Conference, ISC High …, 2021 - Springer
In an effort to guide optimizations and detect performance regressions, developers of large
HPC codes must regularly collect and analyze application performance profiles across …

8 steps to 3.7 tflop/s on nvidia v100 gpu: Roofline analysis and other tricks

C Yang - arxiv preprint arxiv:2008.11326, 2020 - arxiv.org
Performance optimization can be a daunting task especially as the hardware architecture
becomes more and more complex. This paper takes a kernel from the Materials Science …