The landscape of exascale research: A data-driven literature analysis

S Heldens, P Hijma, BV Werkhoven… - ACM Computing …, 2020 - dl.acm.org
The next generation of supercomputers will break the exascale barrier. Soon we will have
systems capable of at least one quintillion (billion billion) floating-point operations per …

Benchmarking machine learning methods for performance modeling of scientific applications

P Malakar, P Balaprakash… - 2018 IEEE/ACM …, 2018 - ieeexplore.ieee.org
Performance modeling is an important and active area of research in high-performance
computing (HPC). It helps in better job scheduling and also improves overall performance of …

Kerncraft: A tool for analytic performance modeling of loop kernels

J Hammer, J Eitzinger, G Hager, G Wellein - Tools for High Performance …, 2017 - Springer
Achieving optimal program performance requires deep insight into the interaction between
hardware and software. For software developers without an in-depth background in …

Tida: High-level programming abstractions for data locality management

D Unat, T Nguyen, W Zhang, MN Farooqi… - … Conference, ISC High …, 2016 - Springer
The high energy costs for data movement compared to computation gives paramount
importance to data locality management in programs. Managing data locality manually is not …

Prediction modeling for application-specific communication architecture design of optical NoC

J Trajkovic, S Karimi, S Hangsan, W Zhang - ACM Transactions on …, 2022 - dl.acm.org
Multi-core systems-on-chip are becoming state-of-the-art. Therefore, there is a need for a
fast and energy-efficient interconnect to take full advantage of the computational capabilities …

Automatic loop kernel analysis and performance modeling with kerncraft

J Hammer, G Hager, J Eitzinger, G Wellein - Proceedings of the 6th …, 2015 - dl.acm.org
Analytic performance models are essential for understanding the performance
characteristics of loop kernels, which consume a major part of CPU cycles in computational …

Accelerating finite-rate chemical kinetics with coprocessors: Comparing vectorization methods on GPUs, MICs, and CPUs

CP Stone, AT Alferman, KE Niemeyer - Computer Physics Communications, 2018 - Elsevier
Accurate and efficient methods for solving stiff ordinary differential equations (ODEs) are a
critical component of turbulent combustion simulations with finite-rate chemistry. The ODEs …

Ppt-multicore: Performance prediction of openmp applications using reuse profiles and analytical modeling

A Barai, Y Arafa, AH Badawy, G Chennupati… - The Journal of …, 2022 - Springer
We present PPT-Multicore, an analytical model embedded in the Performance Prediction
Toolkit (PPT) to predict parallel applications' performance running on a multicore processor …

ComDetective: a lightweight communication detection tool for threads

MA Sasongko, M Chabbi, P Akhtar, D Unat - Proceedings of the …, 2019 - dl.acm.org
Inter-thread communication is a vital performance indicator in shared-memory systems. Prior
works on identifying inter-thread communication employed hardware simulators or binary …

Generating performance models for irregular applications

RD Friese, NR Tallent, A Vishnu… - 2017 IEEE …, 2017 - ieeexplore.ieee.org
Many applications have irregular behavior-eg, input-dependent solvers, irregular memory
accesses, or unbiased branches-that cannot be captured using today's automated …