An overview of cache optimization techniques and cache-aware numerical algorithms

M Kowarschik, C Weiß - Algorithms for memory hierarchies: advanced …, 2003 - Springer
In order to mitigate the impact of the growing gap between CPU speed and main memory
performance, today's computer architectures implement hierarchical memory structures. The …

Machine learning in compiler optimization

Z Wang, M O'Boyle - Proceedings of the IEEE, 2018 - ieeexplore.ieee.org
In the last decade, machine-learning-based compilation has moved from an obscure
research niche to a mainstream activity. In this paper, we describe the relationship between …

The TAU parallel performance system

SS Shende, AD Malony - The International Journal of High …, 2006 - journals.sagepub.com
The ability of performance technology to keep pace with the growing complexity of parallel
and distributed systems depends on robust performance frameworks that can at once …

Collecting performance data with PAPI-C

D Terpstra, H Jagode, H You, J Dongarra - Tools for High Performance …, 2010 - Springer
Modern high performance computer systems continue to increase in size and complexity.
Tools to measure application performance in these increasingly complex environments must …

Measuring energy and power with PAPI

VM Weaver, M Johnson… - 2012 41st …, 2012 - ieeexplore.ieee.org
Energy and power consumption are becoming critical metrics in the design and usage of
high performance systems. We have extended the Performance API (PAPI) analysis library …

On the detection of kernel-level rootkits using hardware performance counters

B Singh, D Evtyushkin, J Elwell, R Riley… - … of the 2017 ACM on Asia …, 2017 - dl.acm.org
Recent work has investigated the use of hardware performance counters (HPCs) for the
detection of malware running on a system. These works gather traces of HPCs for a variety …

Peta-scale phase-field simulation for dendritic solidification on the TSUBAME 2.0 supercomputer

T Shimokawabe, T Aoki, T Takaki, T Endo… - Proceedings of 2011 …, 2011 - dl.acm.org
The mechanical properties of metal materials largely depend on their intrinsic internal
microstructures. To develop engineering materials with the expected properties, predicting …

Prediction of job characteristics for intelligent resource allocation in HPC systems: a survey and future directions

Z Hou, H Shen, X Zhou, J Gu, Y Wang… - Frontiers of Computer …, 2022 - Springer
Nowadays, high-performance computing (HPC) clusters are increasingly popular. Large
volumes of job logs recording many years of operation traces have been accumulated. In the …

Goldilocks: a race and transaction-aware java runtime

T Elmas, S Qadeer, S Tasiran - Acm Sigplan Notices, 2007 - dl.acm.org
Data races often result in unexpected and erroneous behavior. In addition to causing data
corruption and leading programs to crash, the presence of data races complicates the …

[PDF][PDF] Linux perf_event features and overhead

VM Weaver - The 2nd international workshop on …, 2013 - s3.us.cloud-object-storage …
Events: 98K cycles 97.36% matrix_multiply libblas. so. 3.0[.]
ATL_dJIK48x48x48TN48x48x0_ 0.62% matrix_multiply matrix_multiply_atlas [.] …