A tool to analyze the performance of multithreaded programs on NUMA architectures

X Liu, J Mellor-Crummey - ACM Sigplan Notices, 2014 - dl.acm.org
Almost all of today's microprocessors contain memory controllers and directly attach to
memory. Modern multiprocessor systems support non-uniform memory access (NUMA): it is …

Image based search engine using deep learning

S Jain, J Dhar - 2017 Tenth International Conference on …, 2017 - ieeexplore.ieee.org
During previous couple of years, the World Wide Web (WWW) has become an extremely
well-liked information source. To successfully utilize the vast quantity of information that the …

A zero-positive learning approach for diagnosing software performance regressions

M Alam, J Gottschlich, N Tatbul… - Advances in …, 2019 - proceedings.neurips.cc
The field of machine programming (MP), the automation of the development of software, is
making notable research advances. This is, in part, due to the emergence of a wide range of …

Numaperf: Predictive numa profiling

X Zhao, J Zhou, H Guan, W Wang, X Liu… - Proceedings of the 35th …, 2021 - dl.acm.org
It is extremely challenging to achieve optimal performance of parallel applications on a
NUMA architecture, which necessitates the assistance of profiling tools. However, existing …

Scientific application performance on hpc, private and public cloud resources: A case study using climate, cardiac model codes and the npb benchmark suite

PE Strazdins, J Cai, M Atif… - 2012 IEEE 26th …, 2012 - ieeexplore.ieee.org
The ubiquity of on-demand cloud computing resources enables scientific researchers to
dynamically provision and consume compute and storage resources in response to science …

Locality-aware work stealing based on online profiling and auto-tuning for multisocket multicore architectures

Q Chen, M Guo - ACM Transactions on Architecture and Code …, 2015 - dl.acm.org
Modern mainstream powerful computers adopt multisocket multicore CPU architecture and
NUMA-based memory architecture. While traditional work-stealing schedulers are designed …

Numaprof, a numa memory profiler

S Valat, O Bouizi - Euro-Par 2018: Parallel Processing Workshops: Euro …, 2019 - Springer
The number of cores in HPC systems and servers increased a lot for the last few years. In
order to also increase the available memory bandwidth and capacity, most systems became …

NUMA optimizations for algorithmic skeletons

P Metzger, M Cole, C Fensch - Euro-Par 2018: Parallel Processing: 24th …, 2018 - Springer
To address NUMA performance anomalies, programmers often resort to application specific
optimizations that are not transferable to other programs, or to generic optimizations that do …

[PDF][PDF] Integration and Optimization of a 64-core HPC For FEM-and/or CFD Welding Simulations

P Lindström, A de Blanche - NAFEMS: Improving Simulation …, 2013 - diva-portal.org
This document describes the selection and integration of a computational platform
(hardware and operative system) intended for the sake of CWM-analyses. Computational …

[KÖNYV][B] Abstractions for Performance Programming on Multi-Core Architectures with Hierarchical Memory

C Terboven, MS Müller, B Chapman, C Bischof - 2016 - publications.rwth-aachen.de
Parallelprogrammierung für Systeme mit gemeinsamem Speicher (Shared Memory) scheint
auf den ersten Blick oftmals recht einfach, wie zum Beispiel das Einfügen von OpenMP …