Affinity-based thread and data map** in shared memory systems
Shared memory architectures have recently experienced a large increase in thread-level
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
parallelism, leading to complex memory hierarchies with multiple cache memory levels and …
Characterizing communication and page usage of parallel applications for thread and data map**
The parallelism in shared-memory systems has increased significantly with the advent and
evolution of multicore processors. Current systems include several multicore and …
evolution of multicore processors. Current systems include several multicore and …
kMAF: Automatic kernel-level management of thread and data affinity
One of the main challenges for parallel architectures is the increasing complexity of the
memory hierarchy, which consists of several levels of private and shared caches, as well as …
memory hierarchy, which consists of several levels of private and shared caches, as well as …
Compiler support for selective page migration in NUMA architectures
G Piccoli, HN Santos, RE Rodrigues, C Pousa… - Proceedings of the 23rd …, 2014 - dl.acm.org
Current high-performance multicore processors provide users with a non-uniform memory
access model (NUMA). These systems perform better when threads access data on memory …
access model (NUMA). These systems perform better when threads access data on memory …
Locality vs. balance: Exploring data map** policies on numa systems
M Diener, EHM Cruz… - 2015 23rd Euromicro …, 2015 - ieeexplore.ieee.org
In parallel architectures that have a Non-Uniform Memory Access (NUMA) behavior, the
map** of memory pages to NUMA nodes influences the performance of parallel …
map** of memory pages to NUMA nodes influences the performance of parallel …
Kernel-based thread and data map** for improved memory affinity
Reducing the cost of memory accesses, both in terms of performance and energy
consumption, is a major challenge in shared-memory architectures. Modern systems have …
consumption, is a major challenge in shared-memory architectures. Modern systems have …
Using machine learning to optimize graph execution on numa machines
HMG de A. Rocha, J Schwarzrock… - Proceedings of the 59th …, 2022 - dl.acm.org
This paper proposes PredG, a Machine Learning framework to enhance the graph
processing performance by finding the ideal thread and data map** on NUMA systems …
processing performance by finding the ideal thread and data map** on NUMA systems …
Boosting graph analytics by tuning threads and data affinity on numa systems
HMGA Rocha, J Schwarzrock… - 2021 29th Euromicro …, 2021 - ieeexplore.ieee.org
The execution of large real-world graphs, such as web searches and social networks, has
been boosting by modern HPC systems. However, their irregular communication patterns …
been boosting by modern HPC systems. However, their irregular communication patterns …
Effective exploration of thread throttling and thread/page map** on numa systems
J Schwarzrock, HMGA Rocha… - 2020 IEEE 22nd …, 2020 - ieeexplore.ieee.org
NUMA systems have become commonly used in HPC. However, to fully take advantage of
these systems, the right thread-to-core allocation and page placement are essential. On top …
these systems, the right thread-to-core allocation and page placement are essential. On top …
Dynamic thread map** of shared memory applications by exploiting cache coherence protocols
In current computer architectures, the communication performance between threads varies
depending on the memory hierarchy. This performance difference must be considered when …
depending on the memory hierarchy. This performance difference must be considered when …