Affinity-based thread and data map** in shared memory systems

M Diener, EHM Cruz, MAZ Alves, POA Navaux… - ACM Computing …, 2016 - dl.acm.org
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …

Characterizing communication and page usage of parallel applications for thread and data map**

M Diener, EHM Cruz, LL Pilla, F Dupros… - Performance …, 2015 - Elsevier
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …

kMAF: Automatic kernel-level management of thread and data affinity

M Diener, EHM Cruz, POA Navaux, A Busse… - Proceedings of the 23rd …, 2014 - dl.acm.org
One of the main challenges for parallel architectures is the increasing complexity of the
memory hierarchy, which consists of several levels of private and shared caches, as well as …

Compiler support for selective page migration in NUMA architectures

G Piccoli, HN Santos, RE Rodrigues, C Pousa… - Proceedings of the 23rd …, 2014 - dl.acm.org
Current high-performance multicore processors provide users with a non-uniform memory
access model (NUMA). These systems perform better when threads access data on memory …

Locality vs. balance: Exploring data map** policies on numa systems

M Diener, EHM Cruz… - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the
map** of memory pages to NUMA nodes influences the performance of parallel …

Kernel-based thread and data map** for improved memory affinity

M Diener, EHM Cruz, MAZ Alves… - … on Parallel and …, 2015 - ieeexplore.ieee.org
Reducing the cost of memory accesses, both in terms of performance and energy
consumption, is a major challenge in shared-memory architectures. Modern systems have …

Using machine learning to optimize graph execution on numa machines

HMG de A. Rocha, J Schwarzrock… - Proceedings of the 59th …, 2022 - dl.acm.org
This paper proposes PredG, a Machine Learning framework to enhance the graph
processing performance by finding the ideal thread and data map** on NUMA systems …

Boosting graph analytics by tuning threads and data affinity on numa systems

HMGA Rocha, J Schwarzrock… - 2021 29th Euromicro …, 2021 - ieeexplore.ieee.org
The execution of large real-world graphs, such as web searches and social networks, has
been boosting by modern HPC systems. However, their irregular communication patterns …

Effective exploration of thread throttling and thread/page map** on numa systems

J Schwarzrock, HMGA Rocha… - 2020 IEEE 22nd …, 2020 - ieeexplore.ieee.org
NUMA systems have become commonly used in HPC. However, to fully take advantage of
these systems, the right thread-to-core allocation and page placement are essential. On top …

Dynamic thread map** of shared memory applications by exploiting cache coherence protocols

EHM Cruz, M Diener, MAZ Alves… - Journal of Parallel and …, 2014 - Elsevier
In current computer architectures, the communication performance between threads varies
depending on the memory hierarchy. This performance difference must be considered when …